We need to change datatype from String to Integer for age field
>>> df4.printSchema()
root
|-- cat: string (nullable = true)
|-- age: string (nullable = true)
|-- addr: string (nullable = true)
Solution:
>>> df5=df4.withColumn('age',df4.age.cast('Integer'))
>>> df5.printSchema()
root
|-- cat: string (nullable = true)
|-- age: integer (nullable = true)
|-- addr: string (nullable = true)
Change to string to date :
def change_dtypes():
spark= SparkSession.builder.appName('Altimatic_test').getOrCreate()
empdf=spark.read.format('parquet').option('header','True').load('/projects/data/test/parquet2/')
empdf2=empdf.withColumn('JoiningDate_str',to_date(empdf.JoiningDate,"yyyy-mm-dd")).withColumn('ID2',empdf.ID.cast('Integer'))
empdf2.write.csv('/projects/data/test/csv/')
print(empdf2.dtypes)
##Method
change_dtypes()
No comments:
Post a Comment