Wednesday, September 23, 2020

PySpark check availability of file on HDFS

Case :- Check existing file on HDFS by getting configuration and with help of it will check the path of file. 


file_system = spark._jvm.org.apache.hadoop.fs.FileSystem.get(sc._jsc.hadoopConfiguration())

file_check=file_system .exists(spark._jvm.org.apache.hadoop.fs.Path("hdfs://file_path/"))

It will return Boolean True or False

if file_check==True:

    then createDataframe()

else:

    print('Error') 

Exit

No comments:

Post a Comment