当前位置:网站首页>Autumn move script C

Autumn move script C

2022-06-23 01:22:00 Shallow look

Exercise one : Transfer line column

Suppose there are the following results
Create competition result table row_to_col

create table row_to_col
(cdate DATE,
 result varchar(32) not null);

insert into row_to_col values(20210101,' - ');
insert into row_to_col values(20210101,' negative ');
insert into row_to_col values(20210103,' - ');
insert into row_to_col values(20210103,' negative ');
insert into row_to_col values(20210101,' - ');
insert into row_to_col values(20210103,' negative ');

Please use SQL Convert the game results into the following form :

Their thinking :

  1. according to cdate Group query results
  2. adopt coount、if Statement Statistics for each day ’ - ’、' negative ’ Sessions

SQL The statement is as follows :

select cdate as ' Date of competition ',
	   count(if(result=' - ',true,null)) as ' - ',
	   count(if(result=' negative ',true,null)) as ' negative '
from row_to_col
group by cdate

The operation results are as follows :
 Insert picture description here

Exercise 2 : Column turned

Suppose there are the following results :
Create competition result table col_to_row:

create table col_to_row
( Date of competition  date,
  -  integer(4) not null,
  negative  integer(4) not null,
 primary key( Date of competition ));

insert into col_to_row values(20210101,2,1);
insert into col_to_row values(20210103,1,2);

Their thinking :
Time is limited , I have thought about this question for a long time, but I still haven't got a good idea , I'll have a chance to do it later , I'm going to leave a hole here .

Practice three : Continuous login

problem :
There is a user behavior record table t_act_records surface , Contains two fields :uid( user ID),imp_date( date )

  1. Calculation 2021 Every month of the year , Maximum number of consecutive login days per user
  2. Calculation 2021 Every month of the year , continuity 2 There is a list of logged in users every day
  3. Calculation 2021 Every month of the year , continuity 5 Number of users logged in every day
    Create table t_act_records:
DROP TABLE if EXISTS t_act_records;
CREATE TABLE t_act_records
(uid  VARCHAR(20),
imp_date DATE);

INSERT INTO t_act_records VALUES('u1001', 20210101);
INSERT INTO t_act_records VALUES('u1002', 20210101);
INSERT INTO t_act_records VALUES('u1003', 20210101);
INSERT INTO t_act_records VALUES('u1003', 20210102);
INSERT INTO t_act_records VALUES('u1004', 20210101);
INSERT INTO t_act_records VALUES('u1004', 20210102);
INSERT INTO t_act_records VALUES('u1004', 20210103);
INSERT INTO t_act_records VALUES('u1004', 20210104);
INSERT INTO t_act_records VALUES('u1004', 20210105);

Their thinking :

  1. Select any initial date less than the date in the table as the reference date , And make use of datediff The function calculates the time between the user's login date and the reference date Days between
  2. For different users , Number and sort their login dates , And calculate the steps 1 The difference between the number of days in between and this sort number , Write it down as ranking.
  3. It's not hard to find out , When the login date of a user is consecutive , Difference value ranking Will be the same .
  4. according to month 、 user name (uid)、 And ranking Grouping , Find all consecutive days of the month .
  5. Use... Based on consecutive days order by Sort in descending order , Find out the maximum number of consecutive login days .
    Here the idea of solving the problem is to use the article for reference :mysql Continuous date statistics _MYSQL – Calculate the number of consecutive days

SQL sentence :

select month(imp_date) as ' month ',
			 uid,
			 min(imp_date)as ' Start date ',
			 max(imp_date)as ' End date ',
			 count(*) as ' Days in a row '
from (select uid,imp_date,
			datediff(imp_date,'2020-01-01')-rank()over(partition by uid order by imp_date) as ranking
			from t_act_records) as r
group by uid,month(imp_date),r.ranking
order by  Days in a row  desc

Running results , Get the number of consecutive login days for all users per month :
 Insert picture description here
problem 2 and 3 Just add the following where The conditions are good :
It should be noted that , Here, you need to query the above query results as a new table , Otherwise, because sql Statement execution order from–where–select Why , Will cause the field to be missing ‘ Days in a row ’.

where p. Days in a row  = 5
--
where p. Days in a row  = 2

Exercise four :hive Causes of data skew and optimization strategies ?

reason :
1)、key Unevenly distributed
2)、 The nature of business data itself
3)、 I don't think well when I build my watch
4)、 some SQL Statement itself has data skew
Refer to the article for specific details :Hive Causes and solutions of data skew

Practice five :LEFT JOIN Whether there may be more rows ? Why? ?

This may lead to an increase in the amount of data .
 Insert picture description here
function SQL sentence :

SELECT * 
FROM A 
LEFT JOIN B 
on A.name = B.name

give the result as follows :
 Insert picture description here

In this paper, the reference :
mysql Continuous date statistics _MYSQL – Calculate the number of consecutive days
Hive Causes and solutions of data skew
Detailed topic reference :
DataWhale Team learning

原网站

版权声明
本文为[Shallow look]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206220918142609.html