Apache Spark howto import data from a jdbc database using python

2016-10-27

Using Apache spark 2.0 and python I’ll show how to import a table from a relational database (using its jdbc driver) into a python dataframe and save it in a parquet file. In this demo the database is an oracle 12.x file jdbc-to-parquet.py:``` from pyspark.sql import SparkSession

spark = SparkSession \     .builder \     .appName(“Python Spark SQL basic example”) \     .getOrCreate()

df = spark.read.format(“jdbc”).options(url=“jdbc:oracle:thin:ro/ro@mydboracle.redaelli.org:1521:MYSID”, dbtable=“myuser.dim_country”, driver=“oracle.jdbc.OracleDriver”).load()

df.write.parquet(“country.parquet”)


Enter your instance's address


More posts like this