Tuesday, August 21, 2018

Pyspark setup with Anaconda on Ubuntu

Please find below steps to setup Pyspark on anaconda on Ubuntu

(1) Download Anaconda2-2.5.0-Linux-x86_64.sh from below link any version
     https://www.anaconda.com/download/#linux

(2) Now copy the  Anaconda2-2.5.0-Linux-x86_64.sh  on directory location where you want to install anaconda

(3) Go to the directory where you keep  above .sh file and run the file to strat install  of anaconda
 
    chmod +x Anaconda2-5.2.0-Linux-x86_64.sh
    sh  Anaconda2-2.5.0-Linux-x86_64.sh

(4) After successful installation go to cmd prompt and run below cmd.
      dheerendra@dheerendra-PC ~ $ jupyter notebook

(5) http://localhost:8888/tree window will be open with your default browser

(6) Now select python and create Dataframe using below code

     import pyspark
     from pyspark.sql import SparkSession
     spark = SparkSession.builder.getOrCreate()

     table1 = [('dh1',21),('dh2',22),('dh3',33)]
     df_table1=spark.createDataFrame(table1,['name','age'])
     df_table1.show()

Output :

+----+---+
|name|age|
+----+---+
| dh1| 21|
| dh2| 22|
| dh3| 33|
+----+---+

If you are getting error while runing above code.Please install python with below command.

sudo apt-get install python-pip

Now after install above command will execute and give you same result.

No comments:

Post a Comment