Sunday, September 20, 2020

PySpark Change DataType

 We need to change datatype from String to Integer for age field

>>> df4.printSchema()

root

 |-- cat: string (nullable = true)

 |-- age: string (nullable = true)

 |-- addr: string (nullable = true)


Solution:

>>> df5=df4.withColumn('age',df4.age.cast('Integer'))

>>> df5.printSchema()

root

 |-- cat: string (nullable = true)

 |-- age: integer (nullable = true)

 |-- addr: string (nullable = true)


Change to string to date :

def change_dtypes():

        spark= SparkSession.builder.appName('Altimatic_test').getOrCreate()

        empdf=spark.read.format('parquet').option('header','True').load('/projects/data/test/parquet2/')

        empdf2=empdf.withColumn('JoiningDate_str',to_date(empdf.JoiningDate,"yyyy-mm-dd")).withColumn('ID2',empdf.ID.cast('Integer'))

       empdf2.write.csv('/projects/data/test/csv/')

        print(empdf2.dtypes)

##Method

change_dtypes()


No comments:

Post a Comment