IPython/Jupyter notebooks are one of the leading free platforms for data analysis, with many advantages, notably the interactive web-based interface and a large ecosystem of readily available packages for data analysis and visualization. Moreover IPython/Jupyter notebooks are a very handy format for sharing code and data as you will see in the examples.
See also this blog post with examples on how to use Jupyter for querying Apache Impala.
It is of interest to integrate many data sources into Jupyter notebooks to make the platform versatile and to fulfill many different use cases. In this short post you can find examples of how to query data from Oracle using Jupyter notebooks and simple integration with pandas and matplotlib.
Notebook | Short description |
---|---|
Oracle_IPython_sqlplus | Examples of how to use sqlplus inside Jupyter notebooks. It is based on the use of %%bash cell magic and here documents to wrap up sqlplus inside Jupyter cells. |
Oracle_IPython_cx_Oracle_pandas | Examples of how to query Oracle from Python using cx_Oracle and how to integrate with pandas and visualization with matplotlib. |
Oracle_IPython_SQL_magic | Examples of how to query Oracle using %sql line magic (or %%sql cell magic) and of the integration with cx_Oracle and pandas. |
Dependencies and pointers to build a test environment:
- Install IPython and Jupyter. The following assumes Anaconda from Continuum Analytics)
- Install the Oracle client
- Download the software from OTN: http://www.oracle.com/technetwork/topics/linuxx86-64soft-092277.html
- On that same link you can find the installation instructions (scroll down by the end of the page)
- Example of the actions: perform
ln -s libclntsh.so.12.1 libclntsh.so
(12c version) andexport LD_LIBRARY_PATH={oracle client home}
- I have tested this with Oracle client versions 12.1.0.2 and 11.2.0.4 on Linux
- Post client installation:
- set environment:
export ORACLE_HOME={path to the Oracile client installation}
- If not already installed, install libaio (yum install libaio)
- set environment:
- Check that the Oracle client works and all dependencies are set by running sqlplus from the Oracle client home, example:
- check client connectivity with:
sqlplus username/password@dbserver:port/service_name
- check client connectivity with:
- Install cx_Oracle, for example with
pip install cx_Oracle
- Install ipython-sql
- download from https://github.com/LucaCanali/ipython-sql
cd ipython-sql
and(sudo) python setup.py install
- Optionally install the official version on PyPi using
pip install ipython-sql
Great article. fyi for those who are using CDH, Oracle client can be installed on whole cluster with a few mouse clicks using parcels. Just add
ReplyDeletehttp://archive.cloudera.com/oracle-instant-client/11/parcels/latest/
to "Remote Parcel Repository URLs" in Cloudera Manager.