当前位置:网站首页>Spark DF增加一列
Spark DF增加一列
2022-07-25 15:10:00 【南风知我意丿】
文章目录
方法一:利用createDataFrame方法,新增列的过程包含在构建rdd和schema中
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
方法二:利用withColumn方法,新增列的过程包含在udf函数中
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
方法三:利用SQL代码,新增列的过程直接写入SQL代码中
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
方法四:以上三种是增加一个有判断的列,如果想要增加一列唯一序号,可以使用monotonically_increasing_id
//添加序号列新增一列方法4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- 瀑布流布局
- Share a department design method that avoids recursion
- Spark SQL空值Null,NaN判断和处理
- bridge-nf-call-ip6tables is an unknown key异常处理
- C语言函数复习(传值传址【二分查找】,递归【阶乘,汉诺塔等】)
- Implementation of redis distributed lock
- 用OpenPose进行单个或多个人体姿态估计
- Install entityframework method
- 《三子棋》C语言数组应用 --n皇后问题雏形
- npm的nexus私服 E401 E500错误处理记录
猜你喜欢

Award winning interaction | 7.19 database upgrade plan practical Summit: industry leaders gather, why do they come?

延迟加载源码剖析:

用setTimeout模拟setInterval定时器

用OpenPose进行单个或多个人体姿态估计

VS2010 add WAP mobile form template

Boosting之GBDT源码分析

MySQL之事务与MVCC

"How to use" decorator mode

44 Sina navigation, Xiaomi sidebar exercise

bridge-nf-call-ip6tables is an unknown key异常处理
随机推荐
About RDBMS and non RDBMS [database system]
[C topic] force buckle 876. Intermediate node of linked list
Debounce and throttle
给VS2010自动设置模板,加头注释
Award winning interaction | 7.19 database upgrade plan practical Summit: industry leaders gather, why do they come?
密码强度验证示例
海缆探测仪TSS350(一)
[C题目]力扣206. 反转链表
"How to use" agent mode
Gbdt source code analysis of boosting
MySQL之事务与MVCC
Cmake specify opencv version
用setTimeout模拟setInterval定时器
pkg_resources动态加载插件
@Scheduled source code analysis
Spark 内存管理机制 新版
6月产品升级观察站
45padding won't open the box
node学习
什么是物联网