当前位置:网站首页>【无标题】
【无标题】
2022-06-26 09:36:00 【半_调_子】
第一:下载所有hadoop二进制包
第二:下载spark 包
第三:下载java
第四:下载anancode
# 创建虚拟环境 pyspark, 基于Python 3.8
conda create -n pyspark python=3.8
# 切换到虚拟环境内
conda activate pyspark
# 在虚拟环境内安装包
pip install pyhive pyspark jieba -i https://pypi.tuna.tsinghua.edu.cn/simple
通过pycharm写代码:
# coding:utf8
from pyspark import SparkConf, SparkContext
import os
os.environ['JAVA_HOME'] = r"C:\Java\jdk1.8.0_201"
os.environ['SPARK_HOME'] = r"D:\spark-3.1.2-bin-hadoop2.7"
os.environ['PYSPARK_PYTHON'] = r"D:\anaconda3\envs\pyspark\python.exe"
os.environ['HADOOP_HOME']=r"D:\hadoop-2.7.7"
if __name__ == '__main__':
conf = SparkConf().setAppName("helloword")
# 通过SparkConf对象构建SparkContext对象
sc = SparkContext(conf=conf)
file_rdd = sc.textFile("./myfile.text")
words_rdd = file_rdd.flatMap(lambda line: line.split(" "))
# 将单词转换为元组对象, key是单词, value是数字1
words_with_one_rdd = words_rdd.map(lambda x: (x, 1))
# 将元组的value 按照key来分组, 对所有的value执行聚合操作(相加)
result_rdd = words_with_one_rdd.reduceByKey(lambda a, b: a + b)
# 通过collect方法收集RDD的数据打印输出结果
print(result_rdd.collect())
边栏推荐
- libgstreamer-1.0. so. 0: cannot open shared object file: No such file or directory
- LeetCode 接雨水系列 42.(一维) 407.(二维)
- Notes on sports planning on November 22, 2021
- 1. 两数之和(LeetCode题目)
- thinkphp6.0的第三方扩展包,支持上传阿里云,七牛云
- Testing practice - App testing considerations
- Abnormal record-23
- 【深度优先搜索】312.戳气球
- Poj3682 king arthur's birthday celebration (probability)
- How to find and install the dependent libraries of Debian system
猜你喜欢

Install new version cmake & swig & tinyspline

install opencv-contrib-dev to use aruco code

exec系列函数(execl、execlp、execle、execv、execvp)使用

Standard implementation of streaming layout: a guide to flexboxlayout

c语言语法基础之——局部变量及存储类别、全局变量及存储类别、宏定义 学习

Extracting public fragments from thymeleaf

【LeetCode】59. 螺旋矩阵 II

Day 3 array, pre post, character space, keyword and address pointer

Notes on sports planning on November 22, 2021

國際化配置
随机推荐
leetCode-链表的中间结点
首批12家企业入驻!广州首个集中展销老字号产品专柜开张
A list of common methods for customizing paint and canvas of view
测试实践——app 测试注意点
全渠道、多场景、跨平台,App如何借助数据分析渠道流量
Cloud native essay using Hana expression database service on Google kubernetes cluster
The basis of C language grammar -- factoring by function applet
Nested recyclerview in nestedscrollview automatically slides to the bottom after switching
Introduction to stored procedure testing
Win10安装tensorflow-quantum过程详解
How to find and install the dependent libraries of Debian system
install opencv-contrib-dev to use aruco code
The basis of C language grammar -- function definition learning
调用api接口生成不同颜色的微信小程序二维码
Learning and understanding of thread pool (with code examples)
【深度优先搜索】312.戳气球
Enter the page input box to automatically obtain the focus
英语常用短语
Test instructions - common interface protocol analysis
Force buckle ----- remove the maximum and minimum values from the array