Import SQL Data in spark/pyspark cli

Apache Spark is a lightning-fast unified analytics engine for big data and machine learning

While trying to connect/retreive data from SQL Database using spark/pyspark cli you might receive below error

File '/usr/hdp/current/spark2-client/python/lib/', line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o58.jdbc.
: java.sql.SQLException: No suitable driver
        at java.sql.DriverManager.getDriver(

This is because by-defalult SQL driver is not loaded into spark/pyspark cli..

  • You have to add –driver-class-path option while starting cli and provide SQL driver path to load it
  • Fortunately, In case of Azure HDInsight we already have SQL Driver installed in hive lib folder.
pyspark --driver-class-path /usr/hdp/
  • Here is my code to connect SQL and retrieve a RDD
df1 ='user','sparkdatabasetestsql@sparkdatabasetestsql').option('password','xxxxxxxxxxx').jdbc('jdbc:sqlserver://;database=sparkdatabasetestsql;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*;loginTimeout=30;','SalesLT.Customer');

  • Getting number of rows from above RDD


Prashanth Madi

Prashanth Madi

I'm a programmer & Tech enthusiast. I work for Big Data Support Team at Microsoft, but this blog, its content & opinions are my own. I blog about tech, gadgets, code, where we're going & we've been.

Read More