py4j.protocol.Py4JJavaError: An error occurred while calling o90.save
环境:
Ubuntu19.10
anaconda3-python3.6.10
scala 2.11.8
apache-hive-3.0.0-bin
hadoop-2.7.7
spark-2.3.1-bin-hadoop2.7
java version "1.8.0_131"
Mysql Server version: 8.0.19-0ubuntu0.19.10.3 (Ubuntu)
driver:mysql-connector-java-8.0.20.jar
[Driver link|https://mvnrepository.com/artifact/mysql/mysql-connector-java/8.0.20]
使用的代码是:
import pandas as pd from pyspark.sql import SparkSession from pyspark import SparkContext from pyspark.sql import SQLContextdef map_extract(element):file_path, content = elementyear = file_path[-8:-4]return [(year, i) for i in content.split("\n") if i]spark = SparkSession\.builder\.appName("PythonTest")\.getOrCreate()res = spark.sparkContext.wholeTextFiles('hdfs://Desktop:9000/user/mercury/names',minPartitions=40) \.map(map_extract) \.flatMap(lambda x: x) \.map(lambda x: (x[0], int(x[1].split(',')[2]))) \.reduceByKey(lambda x,y:x+y)df = res.toDF(["key","num"]) #把已有数据列改成和目标mysql表的列的名字相同 # print(dir(df)) df.printSchema() print(df.show()) df.printSchema()df.write.format("jdbc").options(url="jdbc:mysql://127.0.0.1:3306/leaf",driver="com.mysql.cj.jdbc.Driver",dbtable="spark",user="appleyuchi",password="appleyuchi").mode('append').save()
提交方式是(下面两种方式都能复现bug):
①pyspark --master yarn(然后在交互是模式中输入交互式代码)
②spark-submit --master yarn --deploy-mode cluster 源码.py
③pyspark --master yarn --conf spark.executor.extraClassPath=/home/appleyuchi/bigdata/apache-hive-3.0.0-bin/lib/mysql-connector-java-8.0.20.jar
同样会报告类似的错误
解决方案:
https://gitee.com/appleyuchi/cluster_configuration/blob/master/物理环境配置流程-必须先看.txt
Reference:
[1]https://zhuanlan.zhihu.com/p/136777424
总结
以上是生活随笔为你收集整理的py4j.protocol.Py4JJavaError: An error occurred while calling o90.save的全部内容,希望文章能够帮你解决所遇到的问题。
- 上一篇: 推特曾为“卖身”支付 9000 万美元天
- 下一篇: intellij运行flink的word