当前位置:网站首页>Spark DF增加一列
Spark DF增加一列
2022-07-25 15:10:00 【南风知我意丿】
文章目录
方法一:利用createDataFrame方法,新增列的过程包含在构建rdd和schema中
val trdd = input.select(targetColumns).rdd.map(x=>{
if (x.get(0).toString().toDouble > critValueR || x.get(0).toString().toDouble < critValueL)
Row(x.get(0).toString().toDouble,"F")
else Row(x.get(0).toString().toDouble,"T")
})
val schema = input.select(targetColumns).schema.add("flag", StringType, true)
val sample3 = ss.createDataFrame(trdd, schema).distinct().withColumnRenamed(targetColumns, "idx")
方法二:利用withColumn方法,新增列的过程包含在udf函数中
val code :(Int => String) = (arg: Int) => {
if (arg > critValueR || arg < critValueL) "F" else "T"}
val addCol = udf(code)
val sample3 = input.select(targetColumns).withColumn("flag", addCol(input(targetColumns)))
.withColumnRenamed(targetColumns, "idx")
方法三:利用SQL代码,新增列的过程直接写入SQL代码中
input.select(targetColumns).createOrReplaceTempView("tmp")
val sample3 = ss.sqlContext.sql("select distinct "+targetColname+
" as idx,case when "+targetColname+">"+critValueR+" then 'F'"+
" when "+targetColname+"<"+critValueL+" then 'F' else 'T' end as flag from tmp")
方法四:以上三种是增加一个有判断的列,如果想要增加一列唯一序号,可以使用monotonically_increasing_id
//添加序号列新增一列方法4
import org.apache.spark.sql.functions.monotonically_increasing_id
val inputnew = input.withColumn("idx", monotonically_increasing_id)
边栏推荐
- dpdk 收发包问题案例:使用不匹配的收发包函数触发的不收包问题定位
- Introduction to raspberry Pie: initial settings of raspberry pie
- spark分区算子partitionBy、coalesce、repartition
- Cmake specify opencv version
- C, c/s upgrade update
- 海缆探测仪TSS350(一)
- TypeScript学习1——数据类型
- Spark SQL空值Null,NaN判断和处理
- JS 同步、异步,宏任务、微任务概述
- Raft of distributed consistency protocol
猜你喜欢

39 simple version of millet sidebar exercise

API health status self inspection

"Ask every day" reentrantlock locks and unlocks

"How to use" agent mode

Share a department design method that avoids recursion

VMware Workstation fails to start VMware authorization service when opening virtual machine

6线SPI传输模式探索

oracle_ 12505 error resolution

Debounce and throttle

在win10系统下使用命令查看WiFi连接密码
随机推荐
Debounce and throttle
As methods for viewing and excluding dependencies
解决asp.net上传文件时文件太大导致的错误
node学习
[C题目]力扣88. 合并两个有序数组
Raft of distributed consistency protocol
[comprehensive pen test] difficulty 4/5, classic application of line segment tree for character processing
Fast-lio: fast and robust laser inertial odometer based on tightly coupled IEKF
继承的实现过程及ES5和ES6实现的区别
spark中saveAsTextFile如何最终生成一个文件
LeetCode_ String_ Medium_ 151. Reverse the words in the string
dpdk 收发包问题案例:使用不匹配的收发包函数触发的不收包问题定位
L1 and L2 regularization
Reprint ---- how to read the code?
"Ask every day" briefly talk about JMM / talk about your understanding of JMM
Solve the error caused by too large file when uploading file by asp.net
Unable to start web server when Nacos starts
"How to use" agent mode
"Ask every day" reentrantlock locks and unlocks
js URLEncode函数