当前位置:网站首页>R for Data Science (note) -- data transformation (select basic use)
R for Data Science (note) -- data transformation (select basic use)
2022-06-24 19:23:00 【Shengxin Xiaopeng】

tidy Stream processing data is fully used in scientific research , I think it's inconsistent with the pipeline %>% Use , Data processing verb , Has a very important relationship .
In the least amount of time , Solve the most important 、 The most common problem , I call this efficiency ; The remaining difficulties , I call it improvement .
select The use of Verbs
The first thing to be clear is
filter Aiming at That's ok The operation of , select Is an operation on a column
Front learning filter The operation of , This study select operation
### actual combat
Again ,select Filter by column name , And column names do not need quotation marks .
###1. Data style
Still used nycflights13 The data in the package
flights
#> # A tibble: 336,776 x 19
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#> <int> <int> <int> <int> <int> <dbl> <int> <int>
#> 1 2013 1 1 517 515 2 830 819
#> 2 2013 1 1 533 529 4 850 830
#> 3 2013 1 1 542 540 2 923 850
#> 4 2013 1 1 544 545 -1 1004 1022
#> 5 2013 1 1 554 600 -6 812 837
#> 6 2013 1 1 554 558 -4 740 728
#> # … with 336,770 more rows, and 11 more variables: arr_delay <dbl>,
#> # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
#> # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
###2. Filter data
select Filtering data can use a single column name , Sequence symbols can also be used , You can also use “-”
# Select columns by name
select(flights, year, month, day)
#> # A tibble: 336,776 x 3
#> year month day
#> <int> <int> <int>
#> 1 2013 1 1
#> 2 2013 1 1
#> 3 2013 1 1
#> 4 2013 1 1
#> 5 2013 1 1
#> 6 2013 1 1
#> # … with 336,770 more rows
# Select all columns between year and day (inclusive)
select(flights, year:day)
#> # A tibble: 336,776 x 3
#> year month day
#> <int> <int> <int>
#> 1 2013 1 1
#> 2 2013 1 1
#> 3 2013 1 1
#> 4 2013 1 1
#> 5 2013 1 1
#> 6 2013 1 1
#> # … with 336,770 more rows
# Select all columns except those from year to day (inclusive)
select(flights, -(year:day))
#> # A tibble: 336,776 x 16
#> dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier
#> <int> <int> <dbl> <int> <int> <dbl> <chr>
#> 1 517 515 2 830 819 11 UA
#> 2 533 529 4 850 830 20 UA
#> 3 542 540 2 923 850 33 AA
#> 4 544 545 -1 1004 1022 -18 B6
#> 5 554 600 -6 812 837 -25 DL
#> 6 554 558 -4 740 728 12 UA
#> # … with 336,770 more rows, and 9 more variables: flight <int>, tailnum <chr>,
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> # minute <dbl>, time_hour <dttm>
###3. expand 1( Boolean operation )
“:” Used to select a series of continuous variables .
“!” Take the complement of a set of variables .
“&” and “|” Used to select the intersection or union of two sets of variables .
“c()” For combination selection
Here we use starwas, iris These two datasets demonstrate
starwars %>% select(name:mass)
#> # A tibble: 87 x 3
#> name height mass
#> <chr> <int> <dbl>
#> 1 Luke Skywalker 172 77
#> 2 C-3PO 167 75
#> 3 R2-D2 96 32
#> 4 Darth Vader 202 136
#> # ... with 83 more rows
“!" Operator negates selection :
starwars %>% select(!(name:mass))
#> # A tibble: 87 x 11
#> hair_color skin_color eye_color birth_year sex gender homeworld species films vehicles starships
#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <list> <list> <list>
#> 1 blond fair blue 19 male masculine Tatooine Human <chr [5]> <chr [2]> <chr [2]>
#> 2 <NA> gold yellow 112 none masculine Tatooine Droid <chr [6]> <chr [0]> <chr [0]>
#> 3 <NA> white, blue red 33 none masculine Naboo Droid <chr [7]> <chr [0]> <chr [0]>
#> 4 none white yellow 41.9 male masculine Tatooine Human <chr [4]> <chr [0]> <chr [1]>
#> # ... with 83 more rows
iris %>% select(!c(Sepal.Length, Petal.Length))
#> # A tibble: 150 x 3
#> Sepal.Width Petal.Width Species
#> <dbl> <dbl> <fct>
#> 1 3.5 0.2 setosa
#> 2 3 0.2 setosa
#> 3 3.2 0.2 setosa
#> 4 3.1 0.2 setosa
#> # ... with 146 more rows
iris %>% select(!ends_with("Width"))
#> # A tibble: 150 x 3
#> Sepal.Length Petal.Length Species
#> <dbl> <dbl> <fct>
#> 1 5.1 1.4 setosa
#> 2 4.9 1.4 setosa
#> 3 4.7 1.3 setosa
#> 4 4.6 1.5 setosa
#> # ... with 146 more rows
“&” and “|” Take the intersection or union of two choices :
iris %>% select(starts_with("Petal") & ends_with("Width"))
#> # A tibble: 150 x 1
#> Petal.Width
#> <dbl>
#> 1 0.2
#> 2 0.2
#> 3 0.2
#> 4 0.2
#> # ... with 146 more rows
iris %>% select(starts_with("Petal") | ends_with("Width"))
#> # A tibble: 150 x 3
#> Petal.Length Petal.Width Sepal.Width
#> <dbl> <dbl> <dbl>
#> 1 1.4 0.2 3.5
#> 2 1.4 0.2 3
#> 3 1.3 0.2 3.2
#> 4 1.5 0.2 3.1
#> # ... with 146 more rows
Use a combination of
iris %>% select(starts_with("Petal") & !ends_with("Width"))
#> # A tibble: 150 x 1
#> Petal.Length
#> <dbl>
#> 1 1.4
#> 2 1.4
#> 3 1.3
#> 4 1.5
#> # ... with 146 more rows
Actually select Use , When used in combination with other functions, it can play a powerful role , This is another note .
边栏推荐
- Several ways of connecting upper computer and MES
- flink-sql的kafka的这个设置,group-offsets,如果指定的groupid没有提
- Ask a question. Adbhi supports the retention of 100 databases with the latest IDs. Is this an operation like this
- Example analysis of corrplot related heat map beautification in R language
- 智能合约安全审计入门篇 —— delegatecall (2)
- Northwestern Polytechnic University attacked by hackers? Two factor authentication changes the situation!
- Generate the last login user account report of the computer through SCCM SQL
- Volcano becomes spark default batch scheduler
- Necessary fault handling system for enterprise network administrator
- 实时渲染:实时、离线、云渲染、混合渲染的区别
猜你喜欢

What other data besides SHP data

How to use R package ggtreeextra to draw evolution tree

SaltStack State状态文件配置实例

Starring V6 platform development take out point process

High dimension low code: component rendering sub component

Starring develops httpjson access point + Database

建立自己的网站(8)

Unity移动端游戏性能优化简谱之 以引擎模块为划分的CPU耗时调优

Introduction and download tutorial of administrative division vector data

Php OSS file read and write file, workerman Generate Temporary file and Output Browser Download
随机推荐
STM32 uses time delay to realize breathing lamp register version
###脚本实现raid0自动化部署
Drawing DEM with GEE gracefully
NFT双币质押流动性挖矿系统开发
Introduction and download of nine npp\gpp datasets
Mqtt protocol usage of LabVIEW
High dimension low code: component rendering sub component
starring V6平台开发接出点流程
NFT质押流动性挖矿系统开发技术
subject may not be empty [subject-empty]
企业网络管理员必备的故障处理系统
finkcdc支持sqlserver2008么?
Unity移动端游戏性能优化简谱之 以引擎模块为划分的CPU耗时调优
PHP OSS file reads and writes files, and workman generates temporary files and outputs them to the browser for download
How to protect biological privacy in the AI era? Overview of the latest "privacy enhancement technology in biometrics" of the Autonomous University of Madrid, comprehensively detailing the biometric p
上位机与MES对接的几种方式
Application scenarios of channel of go question bank · 11
A detailed explanation of the implementation principle of go Distributed Link Tracking
Interpreting harmonyos application and service ecology
Network security review office starts network security review on HowNet