Monday, June 13, 2016

IPython/Jupyter Notebooks for Oracle

Topic: In this short post you can find examples of how to use IPython/Jupyter notebooks for running SQL on Oracle.

IPython/Jupyter notebooks are one of the leading free platforms for data analysis, with many advantages, notably the interactive web-based interface and a large ecosystem of readily available packages for data analysis and visualization. Moreover IPython/Jupyter notebooks are a very handy format for sharing code and data as you will see in the examples.
See also this blog post with examples on how to use Jupyter for querying Apache Impala.

It is of interest to integrate many data sources into Jupyter notebooks to make the platform versatile and to fulfill many different use cases. In this short post you can find examples of how to query data from Oracle using Jupyter notebooks and simple integration with pandas and matplotlib.

Notebook Short description
Oracle_IPython_sqlplus Examples of how to use sqlplus inside Jupyter notebooks. It is based on the use of %%bash cell magic and here documents to wrap up sqlplus inside Jupyter cells.
Oracle_IPython_cx_Oracle_pandas Examples of how to query Oracle from Python using cx_Oracle and how to integrate with pandas and visualization with matplotlib.
Oracle_IPython_SQL_magic Examples of how to query Oracle using %sql line magic (or %%sql cell magic) and of the integration with cx_Oracle and pandas.

Dependencies and pointers to build a test environment:

  • Install IPython and Jupyter. The following assumes Anaconda from Continuum Analytics)
  • Install the Oracle client
    • Download the software from OTN:
    • On that same link you can find the installation instructions (scroll down by the end of the page)
    • Example of the actions: perform ln -s (12c version) and export LD_LIBRARY_PATH={oracle client home}
    • I have tested this with Oracle client versions and on Linux
  • Post client installation:
    • set environment: export ORACLE_HOME={path to the Oracile client installation}
    • If not already installed, install libaio (yum install libaio)
  • Check that the Oracle client works and all dependencies are set by running sqlplus from the Oracle client home, example:
    • check client connectivity with: sqlplus username/password@dbserver:port/service_name
  • Install cx_Oracle, for example with pip install cx_Oracle
  • Install ipython-sql
Reference: a web page by Julian Dyke with examples on how to use cx_Oracle

1 comment:

  1. Great article. fyi for those who are using CDH, Oracle client can be installed on whole cluster with a few mouse clicks using parcels. Just add
    to "Remote Parcel Repository URLs" in Cloudera Manager.