Friday, August 2, 2019

Integration of pyspark with hive

Problem:- If you want use hive tables with Spack Sql and vice versa. So that case you need to follow steps.

Solution:- Please find the steps.

Step 1- Copy hive-site.xml from  /hive/config to /spark/config location.

Step 2-  Add below property with hive-site.xml  in spark and hive
 
You can provide localhost or ip of machine.

            <property>
                       <name>hive.metastore.uris</name>
                       <value>thift://localhost:9083</value>
                      <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
             </property>


Step 2- Add hive path in spark-env.sh.
         e.g  export HIVE_HOME=$HOME/Software/hive 

Step3- Start Hive metastore using below command
           bin/hive --service metastore

Step4 - Start  pyspark
       >>> spark.sql("show databases").show()
+------------+
|databaseName|
+------------+
|     default|
|     sparkdb|
|      testdb|
+------------+


If you are getting  below error while starting hive or running spark sql commnd on spark.

Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D

Solution:- Provide any appropriate folder name in /tmp

<property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/spark</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>




No comments:

Post a Comment