Saturday, September 19, 2020

Integration of pyspark with hive

 Problem:- If you want use hive tables with Spack Sql and vice versa. So that case you need to follow steps.


Solution:- Please find the steps.


Step 1- Copy hive-site.xml from  /hive/config to /spark/config location.


Step 2-  Add below property with hive-site.xml  in spark and hive

 

You can provide localhost or ip of machine.


            <property>

                       <name>hive.metastore.uris</name>

                       <value>thift://localhost:9083</value>

                      <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>

             </property>



Step 2- Add hive path in spark-env.sh.

         e.g  export HIVE_HOME=$HOME/Software/hive 


Step3- Start Hive metastore using below command

           bin/hive --service metastore


Step4 - Start  pyspark

       >>> spark.sql("show databases").show()

+------------+

|databaseName|

+------------+

|     default|

|     sparkdb|

|      testdb|

+------------+



If you are getting  below error while starting hive or running spark sql commnd on spark.


Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D


Solution:- Provide any appropriate folder name in /tmp


<property>

    <name>system:java.io.tmpdir</name>

    <value>/tmp/hive/spark</value>

  </property>

  <property>

    <name>system:user.name</name>

    <value>${user.name}</value>

  </property>


No comments:

Post a Comment