SPARK#
1 Scala 安装#
官网 The Scala Programming Language
下载地址 All Available Versions | The Scala Programming Language
2.13.16版本
上传到/opt目录后
1
2
3
4
5
| # 解压
tar -zxvf /opt/scala-2.13.16.tgz -C /usr/local/software/
# 重命名
cd /usr/local/software/
mv scala-2.13.16/ scala
|
添加环境变量 vim /etc/profile.d/my_env.sh
1
2
3
| # 追加以下内容
export SCALA_HOME=/usr/local/software/scala
export PATH=$PATH:$SCALA_HOME/bin
|
激活 source /etc/profile.d/my_env.sh
直接输入 scala 可以看出是否安装成功 (:quit 退出)
2 Spark 安装#
演示为 Scala 版本 Spark
下载地址 Downloads | Apache Spark
2.1 基本安装#
上传到/opt目录后
1
2
3
4
5
| # 解压
tar -zxvf /opt/spark-3.5.5-bin-hadoop3-scala2.13.tgz -C /usr/local/software/
# 重命名
cd /usr/local/software/
mv spark-3.5.5-bin-hadoop3-scala2.13/ spark
|
配置环境变量 vim /etc/profile.d/my_env.sh
1
2
3
| export SPARK_HOME=/usr/local/software/spark
export SPARKPYTHON=/usr/local/software/spark/python
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SPARKPYTHON
|
激活 source /etc/profile
Spark 测试:
2.2 配置 Spark#
配置文件 spark/conf/spark-env.sh
复制模板文件 cp conf/spark-env.sh.template conf/spark-env.sh
spark-env.sh (在文件末尾追加 ↓ 对应软件的路径和ip要自行修改)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| export JAVA_HOME=/usr/local/software/jdk
export HADOOP_HOME=/usr/local/software/hadoop
export HADOOP_CONF_DIR=/usr/local/software/hadoop/etc/hadoop
export JAVA_LIBRAY_PATH=/usr/local/software/hadoop/lib/native
export SPARK_DIST_CLASSPATH=$(/usr/local/software/hadoop/bin/hadoop classpath)
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=hadoop101:2181,hadoop102:2181,hadoop103:2181
-Dspark.deploy.zookeeper.dir=/spark"
export SPARK_WORKER_MEMORY=8g
export SPARK_WORKER_CORE=8
export SPARK_MASTER_WEBUI_PORT=6633
|
2.3 集群模式安装#
-> hadoop101
进入spark目录后
1
2
3
4
5
| # 复制模板 workers
cp conf/workers.template conf/workers
# 复制模板 历史日志
cp conf/spark-defaults.conf.template conf/spark-defaults.conf
|
spark/conf/workers
修改集群节点信息, 清空后写入
1
2
3
| hadoop101
hadoop102
hadoop103
|
spark/conf/spark-defaults.conf
配置历史日志, 追加 ↓ (hadoopCluster 应对应集群的昵称)
1
2
| spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoopCluster/spark-log
|
hadoop101
在 hdfs中创建spark-log目录
1
| hdfs dfs -mkdir /spark-log
|
spark/conf/spark-env.sh
添加历史信息, 追加 ↓ (hadoopCluster 应对应集群的昵称)
1
2
3
4
| export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.retainedApplications=30
-Dspark.history.fs.logDirectory=hdfs://hadoopCluster/spark-log"
|
修改spark启动文件的名字 (启动文件名和hadoop的启动文件名冲突)
1
2
3
| # 进入spark目录后
mv sbin/start-all.sh sbin/start-spark.sh
mv sbin/stop-all.sh sbin/stop-spark.sh
|
hadoop101
分发Spark给集群其他节点
1
2
| scp -r /usr/local/software/spark/ root@hadoop102:/usr/local/software/
scp -r /usr/local/software/spark/ root@hadoop103:/usr/local/software/
|
最后需要对hadoop102和hadoop103节点的环境变量 /etc/profile 进行配置
1
2
3
4
| # 在 hadoop102 和 hadoop103 的 /etc/profile.d/my_env.sh 追加下面内容
export SPARK_HOME=/usr/local/software/spark
export SPARKPYTHON=/usr/local/software/spark/python
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SPARKPYTHON
|
3 Spark 启动和关闭#
3.1 启动#
hadoop101
1
2
3
4
5
| # 启动spark
start-spark.sh
# 启动历史服务器
start-history-server.sh
|
启动后, 输入 jps 应该多出 Master、Worker、HistoryServer
启动完成后,可以通过 WebUI访问
Spark集群
http://192.168.170.101:6633
http://hadoop101:6633
Spark历史服务器
http://192.168.170.101:18080
http://hadoop101:18080
3.2 关闭#
hadoop101
1
2
| stop-history-server.sh
stop-spark.sh
|