Why: Latency histograms (and by extension wait event histograms) provide very useful information when troubleshooting performance for systems exhibiting response time with multi-mode distribution. In such cases average wait values are often not sufficient to understand the behavior of the system under study and histograms provide a finer level of details. A common case is with I/O performance investigations for many modern storage systems, which can serve I/O from HDD or flash, with completely different latency characteristics.
What: Event histogram metric is a command-line script for "near real-time" monitoring, based on the simple idea of taking two snapshots of the relevant V$ views and print the wait event histogram of the delta values, see also Figure 1 for an example. You can find an example of a case when event histogram metric has helped for performance troubleshooting in a previous blog entry at this link.
What's new: The latest version of event histogram metric (v 1.1, June 2015) adds the support for a little but very useful new feature of Oracle version 12.1.0.2, where histogram buckets go down to the microsecond level with V$EVENT_HISTOGRAM_MICRO. Another feature I have added event histogram metric v 1.1 is the calculation of the estimated time waited per bucket. This gives an indication of the relative weight/importance of the time waited in each latency bucket and allows, for example, to compare the relative weight of many low-latency waits vs. a few events with long latency.
Download the tool from http://cern.ch/canali/resources.htm or from https://github.com/lucacanali The relevant scripts for this post are:
- ehm.sql - a script to collect and visualize wait event histograms for Oracle RAC, 11g and 12c.
- ehm_local.sql - same as ehm.sql but collects data only for the local instance (V$ view data), 11g and 12c.
- ehm_micro.sql - collects and displays wait event histograms with microsecond buckets for RAC, requires Oracle 12.1.0.2.
- ehm_micro_local - same as ehm_micro but for the local instance (V$ view data), requires Oracle 12.1.0.2.
Example of usage
Wait events of interest
Wait event 'db file sequential read' is used by the Oracle engine to provide a measure of the time spent on single-block read operations. This is an important source of data when troubleshooting the performance of many OLTP workloads, as it accounts for the time spent on storage access for random I/O (additional details and comments on the limitations of this approach are detailed in the next paragraph). This is the typical data access path when using indexes, for operations like primary key look-ups, index range scans, nested loops joins, etc, that are quite common in many transactional workloads. A short investigation of the internals of Oracle and its interaction with the OS (see also this blog post for details) shows that in most cases the time accounted by 'db file sequential read' is essentially the time taken to complete the underlying system call call to read data from the storage. Therefore this event provides a good measure for the storage response time as seen from the database server. Moreover in RAC it is easy to aggregate wait event data for all instances in one histogram using GV$ views and this provide an easy source of data for the shared storage.
Note: in Exadata deployments instead of 'db file sequential read' you should expect to see the wait event 'cell single block physical read'.
Another important wait event for troubleshooting the performance of many Oracle databases with significant transactional workloads is 'log file sync'. This wait event measures the time a session will wait on commit operations. There are several components to the wait time accounted in log file sync (see paragraph below for some additional hints and known pitfalls), one of the most important for the scope of this article is the time necessary to perform I/O on the redo log files. Studying the latency of this event can be a very good source of information for performance troubleshooting as it may allow you to understand if writes to redo log files are performed at low latency, so most likely into the storage DRAM or SSD cache, or rather with high latency, which may indicate the lack of caching, or the cache going into write through mode or other issues worth further investigations.
Limitations and additional comments
When investigating latency and storage response time for random I/O, the wait event 'db file sequential read' is a very important source of data, however not all random I/O performed by Oracle is accounted under this wait event. When Oracle see the opportunity of optimizing storage access by using prefetching or batching I/O, it will use asynchronous I/O (when available). Asynchronous I/O for random reads will be accounted with the wait event 'db file parallel read'. Asynchronous I/O improves the utilization of the I/O subsystem by the Oracle processes, as it allows to submit and reap many I/O operations in groups, as opposed for example to reading blocks one by one as in the case of pread system calls. However this makes the task of correlating the wait event latency with storage response time much more difficult. It is the case of 'db file parallel read' but also the case of wait events associated to asynchronous I/O used for sequential I/O operations such as 'direct path read' More on this topic and the pitfalls when using the wait event interface for asynchronous I/O are detailed in this other blog post (see also Oracle Support Doc ID 7448407.8 "Bug 7448407 Resource Accounting and Tracing change for Direct Path Waits").
One known pitfall when interpreting the measurements of the 'log file sync' wait event is that the wait time includes also what can be a significant CPU workload. This has been discussed in details by Kevin Closson. There are also many other details, including a new adaptive mechanism for the communication between the server process and log writer introduced in 11.2.0.3 (see: adaptive log file sync ) and more changes introduced in 12c with redo writer worker processes. See also Oracle Support Doc ID 34592.1 "WAITEVENT: log file sync Reference Note " for a list of known issues and bugs in this area.
As an addendum to the discussion: this is the formula used in the script event histogram metric to compute the the estimated value for the wait time in each bucket: T_wait = 0.75 * bucket_max_val * wait_count. As an example of the usage of the formula, consider 100 wait event in the Oracle bucket 2 ms. This means that the wait time has been between 1 ms * 100 events = 100 ms and 2 ms * 100 event = 200 ms. According to the formula above the estimated value is 0.75 * 2 ms * 100, that is the mean value between the maximum was of 2 ms * 100 and minimim wait of 1 ms * 100.
Related work and links
The study of latency histograms as a function of time leads to a 3D representation problem. A natural solution for it is to represent latency values in a heat map. If you are interested in more details see this blog post on OraLatencyMap, a tool to collect and display heat maps of wait events using SQL*plus and this series of posts on PyLatencyMap, a python-based tool integrated with many data sources, including Oracle wait events, SystemTap and DTrace.
Event histogram metric is part of a group of short command line scripts I use for Oracle troubleshooting, including scripts for near-real time measurement of Oracle performance. The scripts are available for download, you can read more details following this link.
When troubleshooting Oracle performance and investigating I/O latency details it can be useful to get graphs of historical trends from the AWR repository. PerfSheet4 is a tool that I use to extract and visualize AWR data.
Conclusions
Latency histograms of Oracle wait events are a powerful drill-down tool for performance investigations. Histograms are necessary when average values do not provide an accurate picture, for example when studying I/O latency of storage with multi-mode response time. Two wait events are of particular interest for many Oracle workloads: db file sequential read and log file sync. In Oracle 12.1.0.2 a new view V$EVENT_HISTOGRAM_MICRO provides latency details with microsecond buckets, which is very useful when investigating SSDs and low-latency storage in general. A set of simple scripts for near real-time data collection and display of the histograms have been discussed in this post and are available for download.