当前位置:网站首页>Spark runs wordcount (case 2)

Spark runs wordcount (case 2)

2022-06-25 11:29:00 zhangvalue

Spark function WordCount( Case 2 )

Specific details refer to Spark function WordCount( Case a ):

https://zhangvalue.blog.csdn.net/article/details/122501292icon-default.png?t=LBL2https://zhangvalue.blog.csdn.net/article/details/122501292

And preliminary preparations :

Mac install Spark And run SparkPi_zhangvalue The blog of -CSDN Blog Mac install Spark2.4.7https://archive.apache.org/dist/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz decompression tgz file tar xvf spark-2.4.7-bin-hadoop2.7.tgz First create scala Project and compile it into jar After that, it became a local one jar Package one jar Package pass sparksubmit Submit ./bin...https://zhangvalue.blog.csdn.net/article/details/122501186

import org.apache.spark.{SparkConf, SparkContext}

/**
  *  Count the number of characters 
  */
object WorkCount {
  def main(args: Array[String]) {
    if (args.length < 1) {
      System.err.println("Usage: <file>")
      System.exit(1)
    }
    val conf = new SparkConf()
    val sc = new SparkContext(conf)
    //SparkContext  It is to submit the code to the cluster or local channel , We write  Spark Code , Whether you want to run a local or cluster, you must have  SparkContext  Example .
    val line = sc.textFile(args(0))
    // Save the read content to line Variable , Actually line It's a MappedRDD,Spark Code for , It's all based on RDD Operation of the ;
    line.flatMap(_.split("")).map((_, 1)).reduceByKey(_+_).collect.foreach(println)

    sc.stop
  }
}

#!/bin/bash

cd $SPARK_HOME/bin
spark-submit \
--master spark://localhost:7077 \
--class WorkCount \
--name WorkCount \
--executor-memory 2048M \
--driver-memory 3096M \
/Users/zhangsf/bigdata/myjar/wordcount.jar \
hdfs://localhost:9000/zhangvalue/input/poet.txt

原网站

版权声明
本文为[zhangvalue]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202200539403023.html