If you are running pysaprk on single machine and try to save you output then spark will save your output on hdfs directory.
So please run Hadoop before export or save output data on pysaprk
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
table1 = [('dh1',21),('dh2',22),('dh3',33)]
df_table1=spark.createDataFrame(table1,['name','age'])
## Hadoop running Pseudo Distributed Mode and below path is hdfs path
df_table1.write.format('csv').save("/home/dheerendra/Working/pyfiles/test3")
So please run Hadoop before export or save output data on pysaprk
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
table1 = [('dh1',21),('dh2',22),('dh3',33)]
df_table1=spark.createDataFrame(table1,['name','age'])
## Hadoop running Pseudo Distributed Mode and below path is hdfs path
df_table1.write.format('csv').save("/home/dheerendra/Working/pyfiles/test3")
No comments:
Post a Comment