External Table: Oracle

Showing posts with label Oracle. Show all posts

Monday, September 8, 2025

Troubleshoot I/O & Wait Latency with OraLatencyMap and PyLatencyMap

I recently chased an Oracle performance issue where most reads were sub-millisecond (cache), but a thin band around ~10 ms (spindles) dominated total wait time. Classic bimodal latency: the fast band looked fine in averages, yet the rare slow band owned the delay.

To investigate, and prove it, I refreshed two of my old tools:

OraLatencyMap (SQL*Plus script): samples Oracle’s microsecond wait-event histograms and renders two terminal heat maps with wait event latency details over time
PyLatencyMap (Python): a general latency heat-map visualizer that reads record-oriented histogram streams from Oracle scripts, BPF/bcc, SystemTap, DTrace, trace files, etc.

Both now have fresh releases with minor refactors and dependency checks.

ATLAS DCS Analysis with Apache Spark and Jupyter Notebooks

The ATLAS Detector Control System (DCS) at CERN is essential for ensuring optimal detector performance. Each year, the system generates tens of billions of time-stamped sensor readings, presenting considerable challenges for large-scale data analysis. Although these data are stored in Oracle databases that excel in real-time transactional processing, the configuration—optimized with limited CPU resources to manage licensing costs—makes them less suited for extensive historical time-series analysis.

To overcome these challenges, a modern data pipeline has been developed that leverages Apache Spark, CERN’s Hadoop service, and the Service for Web-based Analysis (SWAN) platform. This scalable, high-performance framework enables researchers to efficiently process and analyze DCS data over extended periods, unlocking valuable insights into detector operations. By integrating advanced big data technologies, the new system enhances performance monitoring, aids in troubleshooting Data Acquisition (DAQ) link failures, and supports predictive maintenance, thereby ensuring the continued reliability of the ATLAS detector systems.

Note: this blog post is a reduced version of the article Advancing ATLAS DCS Data Analysis with a Modern Data Platform by Luca Canali, Andrea Formica and Michelle Solis.

The Data Pipeline: From Storage to Analysis

Figure 1: Overview of the Big Data architecture for Detector Control System (DCS) data analysis. The system integrates data from Oracle databases (including DCS, luminosity, and run information) and file-based metadata and mappings into the Hadoop ecosystem using Parquet files. Apache Spark serves as the core processing engine, enabling scalable analysis within an interactive environment powered by Jupyter notebooks on CERN SWAN. Reproduced with permission from Advancing ATLAS DCS Data Analysis with a Modern Data Platform.

Data Storage in Oracle Databases

The ATLAS Detector Control System (DCS) data is primarily stored in Oracle databases using a commercial product, the WinCC OA system, optimized for real-time monitoring and transactional operations. Each detector’s data is managed within dedicated database schemas, ensuring structured organization and efficient access.

At the core of this storage model is the EVENTHISTORY table, a high-volume repository that records sensor IDs, timestamps, and measurement values across thousands of monitoring channels. This table grows rapidly, exceeding one billion rows annually, requiring advanced partitioning strategies to facilitate efficient data access. To improve performance, range partitioning is implemented, segmenting the table into smaller, manageable partitions based on predefined time intervals, such as monthly partitions.

Since direct querying of this vast dataset for large-scale analysis can impose a heavy load on the production Oracle systems, a read-only replica copy, is used as the data source for many data querying use cases and for data extraction into CERN’s Hadoop-based analytics platform. This approach ensures that the primary database remains unaffected by analytical workloads, allowing detector experts to access and process historical data efficiently without impacting real-time operations.

Leveraging CERN’s Hadoop Service

To address the challenges of handling large-scale DCS data analysis, CERN’s Hadoop cluster, Analytix, provides a scalable and high-performance infrastructure tailored for parallelized computation and distributed storage. With over 1,400 physical cores and 20 PB of distributed storage, it enables efficient ingestion, processing, and querying of massive datasets.

Currently, approximately 3 TB of DCS data—representing 30% of the total available records—has been migrated into the Hadoop ecosystem, covering data from 2022 onward. Data extraction is performed via Apache Spark, leveraging the Spark JDBC connector to read from the read-only Oracle replica. Daily import jobs incrementally update the core EVENTHISTORY table, appending new records without reprocessing the entire dataset. Smaller, less dynamic tables undergo full replacements to maintain consistency.

For optimized storage and performance, all ingested data is converted to Apache Parquet format, a columnar storage system designed for high-speed analytical queries. The dataset is partitioned by day, enabling partition pruning—a technique that allows queries to efficiently filter relevant time slices, significantly reducing query execution times. The system can use Spark's parallel processing to rapidly process queries that target billions of individual data rows, completing such operations in just a few seconds and making it an ideal solution for correlation studies, anomaly detection, and long-term trend analysis of detector performance.

This modern data pipeline integrates seamlessly with CERN’s Jupyter notebooks service (SWAN), providing detector experts with a Python-based interactive environment for exploratory data analysis, visualization, and machine learning applications. The combination of Apache Spark, Parquet, and Hadoop enables the scalable processing of DCS data, facilitating key analyses such as monitoring DAQ link instabilities, tracking high-voltage performance, and diagnosing hardware failures in the ATLAS New Small Wheel (NSW) detector.

The Role of Apache Spark

Apache Spark plays a pivotal role in transforming how this data is accessed and analyzed. The Spark-based data pipeline extracts data from a read-only replica of the primary production database, ensuring minimal disruption to live operations. Using JDBC connectivity, Spark jobs are scheduled to run daily, incrementally updating Parquet files stored in CERN’s Hadoop cluster.

Key optimizations include:

Partitioning: Data is partitioned by day to facilitate faster querying and improved storage efficiency.
Incremental Updates: Only new data is ingested daily, preventing redundant processing.
Columnar Storage with Parquet: Apache Parquet enables efficient data retrieval, reducing query execution time and storage costs.

Extracting Data from Oracle using Apache Spark

Below is an example of how to create a Spark DataFrame that reads from an Oracle table using JDBC:

Run Oracle free 23ai on a container from gvenzl dockerhub repo https://github.com/gvenzl/oci-oracle-free

docker run -d --name mydb1 -e ORACLE_PASSWORD=oracle -p 1521:1521 gvenzl/oracle-free:23-slim
wait till the DB is fully started by checking the progress of the startup log at: docker logs -f mydb1

You need an Oracle client JDBC jar, available in Maven Central or download from the Oracle website:

bin/pyspark --packages com.oracle.database.jdbc:ojdbc11:23.7.0.25.01

Edit with the target database username:

db_user = "system"

Database server connection string (modify for the actual setup):

db_connect_string = "localhost:1521/FREEPDB1"

Database password:

db_pass = "oracle"

Query to extract data from the target database (example query):

myquery = "SELECT rownum AS id FROM dual CONNECT BY level<=10"

Mapping the Oracle query/table to a Spark DataFrame:

df = (spark.read.format("jdbc")
           .option("url", f"jdbc:oracle:thin:@{db_connect_string}")
           .option("driver", "oracle.jdbc.driver.OracleDriver")
           .option("query", myquery)
           .option("user", db_user)
           .option("password", db_pass)
           .option("fetchsize", 10000)
           .load())

Show schema and data for testing purposes:

df.printSchema()
df.show()

For more details on using Spark to read data from Oracle databases, see this note.

Implementing Time Partitioning

To efficiently partition the data by time, a custom post-processing code in PySpark is used. Below is an example of how partitioning is applied:

Import necessary functions:

from pyspark.sql.functions import col, year, month, dayofmonth

Read data from Oracle as a DataFrame:

df = (spark.read.format("jdbc")
           .option("url", f"jdbc:oracle:thin:@{db_connect_string}")
           .option("driver", "oracle.jdbc.driver.OracleDriver")
           .option("dbtable", "EVENTHISTORY")
           .option("user", db_user)
           .option("password", db_pass)
           .option("fetchsize", 10000)
           .load())

# Extract partitioning keys (year, month, day) from the 'timestamp' column

df = df.withColumn("year", year(col("timestamp"))) \
       .withColumn("month", month(col("timestamp"))) \
       .withColumn("day", dayofmonth(col("timestamp")))

# Write the DataFrame as Parquet files partitioned by year, month, and day

output_path = "hdfs://path/to/output_directory"

df.write.partitionBy("year", "month", "day").parquet(output_path)

For more details on writing data to Parquet with Spark, see this note.

Analysis Framework: A User-Friendly Approach

Apache Spark as the Core Processing Engine

The Apache Spark ecosystem allows for seamless querying and processing of vast datasets. Spark DataFrames and Spark SQL APIs offer a familiar and flexible interface for data manipulation, similar to Pandas for Python users. By enabling distributed computation, Spark ensures that billions of rows can be processed within seconds.

Benefits of Spark in the ATLAS DCS framework:

Scalability and Performance: Spark efficiently uses the available cores on each node and distributes workloads across multiple nodes.
Powerful APIs: Spark natively uses the DataFrame API and also makes available the SQL language, both provide for powerful and expressive APIs to boost performance.
Fault Tolerance: Spark has a proven architecture that provides automatic recovery and retries from many type of failures in a distributed environment.

Platform integration with Jupyter notebooks and Spark

Front-end analysis is conducted via Jupyter notebooks on the CERN’s SWAN platform, offering researchers an interactive and intuitive interface. Key capabilities include:

Spark integration: A dedicated component, the Spark Connector, abstracts the complexities of Spark configuration, ensuring seamless interaction with the Hadoop ecosystem.
Python environment and Dynamic Visualization: The platform harnesses the robust Python ecosystem for data processing, enabling the dynamic creation of tables, charts, and plots.
Data Integration: Seamless connectivity to diverse data sources—including Oracle databases and web services—simplifies the integration process, providing comprehensive access to all relevant data.

Figure 2: Analysis of ATLAS Detector Control System (DCS) data using Python and Apache Spark. The figure highlights specific elements of the ATLAS New Small Wheel (NSW) MicroMegas (MMG) subdetector that exhibited unstable behavior, prompting further investigation. This visualization was generated using a modern data platform that integrates Jupyter notebooks, CERN’s Hadoop service, and Spark-based analytics. The approach enables large-scale processing and efficient troubleshooting of detector performance. Reproduced with permission from Advancing ATLAS DCS Data Analysis with a Modern Data Platform.

Future Enhancements

To further optimize scalability, performance, and analytical capabilities, we are exploring several key improvements:

Kubernetes for Spark Orchestration: Moving from a Hadoop-based cluster to Kubernetes-managed Spark deployments will streamline resource allocation, optimize workload scheduling, and enable dynamic scaling during peak analysis periods. This transition also facilitates a smoother shift toward cloud-based architectures.
Cloud Storage Solutions: We are evaluating cloud-based storage options such as Amazon S3, which would further ease migration to a cloud environment and enhance data accessibility and scalability.
Advanced Data Formats: We are considering the adoption of modern data formats like Apache Iceberg and Delta Lake. These formats offer improved data ingestion workflows, better query performance and support for evolving data schemas, and enhanced data management capabilities in general.
Machine Learning and AI Integration: Leveraging GPU resources available on CERN’s SWAN platform will enable advanced machine learning techniques for predictive analytics, anomaly detection, and automated troubleshooting. This integration aims to identify detector inefficiencies and potential failures in real time, ultimately improving operational reliability and reducing downtime.

These enhancements aim to future-proof the DCS data analysis framework, ensuring it remains a highly efficient, scalable, and adaptable platform for ongoing and future ATLAS detector operations.

Conclusion

The integration of Apache Spark with CERN’s Hadoop infrastructure and CERN's Notebook service, has significantly enhanced ATLAS DCS data processing and analysis, by enabling a scalable, high-performance, and user-friendly platform. This framework empowers researchers to extract meaningful insights, enhance detector performance monitoring, and streamline troubleshooting processes, significantly improving operational efficiency. As the project continues to evolve, the adoption of cloud-based storage, Kubernetes orchestration, and AI-driven analytics will further enhance the platform’s capabilities supporting the needs of the scientific and engineering community.

Acknowledgements and Links

This work is based on the article Advancing ATLAS DCS Data Analysis with a Modern Data Platform by Luca Canali, Andrea Formica and Michelle Solis. Many thanks to our ATLAS colleagues, in particular from the ADAM (Atlas Data and Metadata) team and ATLAS DCS. Special thanks to the CERN Databases and Data Analytics group for their help and support with Oracle, Hadoop, SWAN and Spark services.

Additional links and notes:

Writing data to Parquet with Spark, see this note
Using Spark with Oracle, see this note
A short course on Apache Spark

Tuesday, May 31, 2016

Linux BPF/bcc for Oracle Tracing

Topic: In this post you will find a short discussion and pointers to the code of a few sample scripts that I have written using Linux BPF/bcc and uprobes for Oracle tracing.

Previous work and motivations

Tools for dynamic tracing are very useful for troubleshooting and internals investigations of Oracle workloads. Dynamic tracing probes on the OS/kernel, can be used to measure the details for I/O latency for example. Moreover probes on the Oracle userspace functions can be be used to complement Oracle instrumentation and probe deeper into the internals of the Oracle engine, when needed. For example in a work done in collaboration with Frits Hoogland we have investigated how to link Oracle wait event interface with tools able to probe the userspace, such as gdb and dynamic tracing tools as DTrace, SystemTap and Perf. More details and examples on this topic can be found in previous work: Modern Linux Tools for Oracle Troubleshooting (presentation at UKOUG Tech14), Life of an Oracle I/O: Tracing Logical and Physical I/O with SystemTap, Linux Perf Probes for Oracle Tracing, Flame Graphs for Oracle.

What's new with BPF/bcc

BPF together with his frontend bcc are new and exciting technologies in the Linux dynamic tracing ecosystem. If you are not familiar with them, you can get up to speed using the excellent material in Brendan Gregg's blog. While the tools are still in development as I write this, a large amount of important features are already available starting from kernel 4.5, notably the support for uprobes that I have used in the scripts discussed later in this post.

I have started experimenting with porting a few probes for the Oracle userspace that I had written using SystemTap and discussed in this blog. While the language/syntax is completely different between SystemTap and BPF/bcc, porting the probes has turned out to be relatively straightforward. The work has been made substantially easier by the availability of a large selection of well-written and documented scripts in the tools directory of bcc. At present writing BPF/bcc scripts is a mixture of coding in Python and C, which feels to me both familiar and at the same time strangely low-level compared to the more mature environments for example of DTrace and SystemTap.

The example scripts that I have written using BPF/bcc are on Github. Here is a list with comments:

Script name	Short description
ora_sqlparse_trace.py	Tracing of Oracle SQL parsing. This script traces SQL hard parsing on Oracle binaries hooking on the Oracle function "opiprs" and reads from function arguments (CPU registers) and from process memory.
ora_wait_trace.py	Tracing of Oracle wait events. This script traces Oracle sessions by hooking on the functions "kskthewt" and "kews_update_wait_time" and reads from function arguments (CPU registers).
ora_logicalIO_histogram.py	Logical IO latency histograms. This script measures the latency between call and return time for the Oracle function "kcbgtcr", which is an important part of the logical IO processing for consistent reads.
ora_wait_histogram.py	Wait event latency histograms. This script traces Oracle sessions by hooking on the functions "kskthewt" and "kews_update_wait_time" and reads from function arguments (CPU registers). BPF computes the latency histogram for the wait events and the script prints the values on stdout.

An example of the usage of ora_wait_histogram.py to measure and display wait event latency:

# stdbuf -oL ./ora_wait_histogram.py 10 10|sed -e 's/event# = /event#=/g' -f eventsname.sed
Start tracing oracle wait events... Hit Ctrl-C to end.

event=db file sequential read
wait time, microsec : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 25 | |
128 -> 255 : 24521 |******************** |
256 -> 511 : 46788 |****************************************|
512 -> 1023 : 12169 |********** |
1024 -> 2047 : 1132 | |
2048 -> 4095 : 660 | |
4096 -> 8191 : 248 | |
8192 -> 16383 : 29 | |

Latency heat maps to display histograms collected with BPF/bcc

PyLatencyMap is a command-line tool for visualizing latency histograms using heat maps on terminal graphics, using ANSI escape codes. PyLatencyMap can be used to investigate I/O performance for random I/O, especially suited for the cases of multiple modes of response time from the storage (SSD cache, HDD, latency outliers). The original idea for PyLatencyMap comes from the work of Brendan Gregg on latency heat maps. I have added to the PyLatencyMap repository a modified version of the biolatency.py script to measure I/O latency histograms: this is the link to the script pylatencymap_biolatency.py and an example of heat maps generated with PyLatencyMap with the script Example11_BPF-bcc_blockIO_latency.sh:

The test workload has been generated using Kevin Closson's SLOB. Additional references with a more detailed discussion of the topic of testing Oracle I/O with SLOB and measuring latency heat maps are:
Heat Map Visualization of I/O Latency with SystemTap and PyLatencyMap and OraLatencyMap v1.1 and Testing I/O with SLOB 2.

Notes on the test environment

The scripts discussed in this post have been developed on Fedora 24 (alpha) running Linux kernel version 4.6 (using the rawhide kernel) and have been tested on workloads generated using Oracle version 11.2.0.4. This is not an Oracle-supported configuration and the scripts are intended mainly as a demonstration of the technology and for learning purposes.
Here are some pointers on the steps I used to setup a lab environment for testing:

Download Fedora server from https://getfedora.org/en/server/ (version 24 beta, as I write this).
Installed Fedora, for example on a VM with VirtualBox
Run "sudo dnf update" to get the latest kernel and packages
Get kernel-devel with "sudo dnf install kernel-devel"
Follow the installation instructions for bcc on Fedora from https://github.com/iovisor/bcc/blob/master/INSTALL.md

Note in particular the step for configuring the rawhide kernel, probably a good choice when testing BPF, as new features are being added on a regular basis as I write this:

# sudo dnf config-manager --add-repo=http://alt.fedoraproject.org/pub/alt/rawhide-kernel-nodebug/fedora-rawhide-kernel-nodebug.repo

# sudo dnf update

Additional pointers and recipes on how to install Oracle 11.2 on Fedora can be found on Tim Hall's website at: https://oracle-base.com/articles/11g/articles-11g

Conclusions

BPF with its bcc frontend are new and powerful tools for dynamic tracing for Linux. A discussion of the merits and shortfalls of BPF/bcc vs other existing solutions in the Linux dynamic tracing ecosystem is beyond the scope of this post. What you can find in this post are a few example scripts that I have written for tracing Oracle using BPF/bcc and uprobes and an additional script for integrating BPF/bcc with PyLatencyMap, which provides visualization as heat maps of the histograms generated using BPF/bcc.
Although the BPF/bcc environment is currently under evolution, it appears already a very useful addition to the toolbox for troubleshooting and performance investigations of Linux workloads. A set of example scripts/tools that come with the bcc repository are of great help for getting started both with using BPF/bcc and with writing new scripts. BPF/bcc can only run on relatively new kernels (as I write this, I have tested the scripts discussed in this post on kernel version 4.6) and this is an obstacle for its adoption in many environments, at least in the short term.

Note added, February 2019: Red Hat and Oracle Linux 7.6 have backported BPF and can run bcc scripts (yum install bcc*). I have updated the script repository accordingly, see https://github.com/LucaCanali/Linux_tracing_scripts/tree/master/BPF-bcc_Userspace_Oracle

Credits and acknowledgements

Many of the original ideas and tools discussed here are inspired or directly derived from the awesome published work of Brendan Gregg.
Many thanks also to the development teams of BPF and bcc for providing and supporting this new powerful tools. In particular thanks to Brenden Blanco for his work on uprobes for bcc and for his support on issue #478.
The work of investigating Oracle userspace with dynamic tracing tools has been done in collaboration with Frits Hoogland.