pyspark DataFrame 转RDD
生活随笔
收集整理的这篇文章主要介绍了
pyspark DataFrame 转RDD
小编觉得挺不错的,现在分享给大家,帮大家做个参考.
# -*- coding: utf-8 -*-
from __future__ import print_function
from pyspark.sql import SparkSession
from pyspark.sql import Rowif __name__ == "__main__":# 初始化SparkSessionspark = SparkSession \.builder \.appName("RDD_and_DataFrame") \.config("spark.some.config.option", "some-value") \.getOrCreate()sc = spark.sparkContextlines = sc.textFile("employee.txt")parts = lines.map(lambda l: l.split(","))employee = parts.map(lambda p: Row(name=p[0], salary=int(p[1])))#RDD转换成DataFrameemployee_temp = spark.createDataFrame(employee)#显示DataFrame数据employee_temp.show()#创建视图employee_temp.createOrReplaceTempView("employee")#过滤数据employee_result = spark.sql("SELECT name,salary FROM employee WHERE salary >= 14000 AND salary <= 20000")# DataFrame转换成RDDresult = employee_result.rdd.map(lambda p: "name: " + p.name + " salary: " + str(p.salary)).collect()#打印RDD数据for n in result:print(n)
总结
以上是生活随笔为你收集整理的pyspark DataFrame 转RDD的全部内容,希望文章能够帮你解决所遇到的问题。
- 上一篇: pyspark.sql.DataFram
- 下一篇: 推荐算法 之协同过滤