Sunday, January 11, 2026

Compare array list with Dataframe column

 # If you have array list column of dataframe and need to check or compare  another element of same dataframe then you can achieve this by using  "expr" function for more details please find below code.

from pyspark.sql.functions import expr    
from pyspark.sql import functions as sf 
from pyspark.sql.functions import array_contains 



df_desc_split=df_trx.withColumn('split_desc',sf.split(sf.col('description'),' '))

df_name_flg=df_desc_split.withColumn("first_name_flag", sf.expr("array_contains(split_desc, FIRSTNAME)")).withColumn("middle_name_flag", sf.expr("array_contains(split_desc, MIDDLE)"))

Explanation : I have description field containing string like "My name is dheerendra" which I split and keep in array field "split_desc" like ['My','name','is','dheerendra']

Now if I have another column of same dataframe e.g "name" which contain "dheerendra".

|name   |split_desc|
|dheerendra| ['My','name','is','dheerendra']|

If I need to check the existence of 'dheerendra' in  split_desc field then need to use "expr" function along with array_contains functions

df_desc_split.withColumn("first_name_flag", sf.expr("array_contains(split_desc, FIRSTNAME)"))

No comments:

Post a Comment