Tuesday, August 21, 2018

Pyspark DataFrame data save to hdfs

If you are running pysaprk on single machine and try to save you output then spark will  save your output on hdfs directory.

So please run Hadoop before export or save output data on pysaprk


import pyspark
     from pyspark.sql import SparkSession
     spark = SparkSession.builder.getOrCreate()

     table1 = [('dh1',21),('dh2',22),('dh3',33)]
     df_table1=spark.createDataFrame(table1,['name','age'])

## Hadoop running Pseudo Distributed Mode and below path is hdfs path

df_table1.write.format('csv').save("/home/dheerendra/Working/pyfiles/test3")

No comments:

Post a Comment