Wednesday, March 18, 2015

Heat Map Visualization of I/O Latency with SystemTap and PyLatencyMap

Topic: PyLatencyMap v1.2 a tool for collecting and visualizing I/O latency data collected and about its integration with SystemTap.

Introduction: When studying storage performance latency drill-down can be very important. Measuring the average I/O latency is often not enough, latency histograms are proven to be more suitable for investigating modern storage systems. This is because the storage service time can present multiple "modes". A common case is with storage systems where I/O requests can be served by low-latency SSD or by spindles with much higher service time. For such systems latency histograms provide an additional dimension to the performance investigation.
Latency heat maps come into play when we want to visualize a series of latency histograms over time. This is a 3-D visualization problem and heat maps provide a way to represent the third dimension as color (see as an example Figure 1 later in this article).
The main ideas behind the use of latency histograms for investigating storage performance and their visualization heat maps have been previously detailed in the excellent paper by Brendan Gregg on ACM queue and also on blog articles and presentations.

PyLatencyMap is a tool and set of scripts for gathering  and displaying heat maps using the command line interface (supporting ANSI terminal graphics). The main use for which PyLatencyMap has been developed is to measure and display storage latency data. PyLatencyMap can also be used to display heat maps from a generic source of latency data.
The latest update to PylatencyMap, version 1.2, provides integration with SystemTap and a few example scripts for getting started. SystemTap can be used to collect storage latency and PylatencyMap for producing the associated heat maps. Many more data sources have already been integrated with PyLatencyMap and examples are available with the tool: Oracle wait event histograms, DTrace histograms and custom aggregations from trace files.
PyLatencyMap has a modular architecture, which follows the typical use of Unix/Linux CLI tools. Data is generated by a collector script (for example a SystemTap script), then it is piped through an optional filter and finally piped into to the visualization engine:

latency data_source | <optional connector script> | python LatencyMap.py <options>

A series of examples and get-started scripts are provided with the tool. In this article we will focus on the scripts based on SystemTap. For discussions on how to use PyLatencyMap for SQL*plus and DTrace see also this other blog entry. Just to get started see the following CLI to run data collection with SystemTap and visualization with PyLatencyMap:

stap -v SystemTap/blockio_rq_issue_basic_pylatencymap.stp |python SystemTap/systemtap_connector.py |python LatencyMap.py

The script blockio_rq_issue_basic_pylatencymap.stp measures the latency of block I/O operations and can be used to collect latency histograms and display them as heat maps. The probe hooks top kernel.trace("block_rq_issue") and kernel.trace("block_rq_complete").
There are three probes in the script: one that gathers information at each block io operation, another that collects stats at the end of the I/O and a third probe that periodically prints the latency histogram. Note the use of the aggregation operator '<<<'  that SystemTap uses to populate the histograms, which are later printed with the use 'hist_log'.

probe kernel.trace("block_rq_issue") {
        requestTime[$rq] = gettimeofday_us()
}

probe kernel.trace("block_rq_complete") {
   t = gettimeofday_us()
   s = requestTime[$rq]
   if (s > 0) {
       latencyTimes <<< (t-s)
       delete requestTime[$rq]
   }
}

probe timer.sec(3) {
   printf("\n<begin record>\n")
   printf("timestamp, microsec, %d, %s\n",gettimeofday_us(),tz_ctime(gettimeofday_s()))
   printf("label, Latency of block I/O requests measured with SystemTap\n")
   printf("latencyunit, microsec\n")
   printf("datasource, systemtap\n")
   if (@count(latencyTimes) > 0)
       println(@hist_log(latencyTimes))
   printf("\n<end record>\n")

}

SystemTap scripts for PyLatencyMap:

For production usage there are more advanced scripts than the one just discussed. The script  Example9_SystemTap_blockIO_req.sh is based on blockio_rq_issue_pylatencymap.stp and provides tracing based on block_rq_issue and block_rq_complete as the example above but contains additional filters to remove 'false positives'. For ease of use this is packed into

Another example script is Example9b_SystemTap_blockIO_filter_read_8KB.sh is based on blockio_rq_issue_filter_pylatencymap.stp which provides the possibility to further filter on the block I/O being issues, in particular the defaults filter for 8KB-reads, which are of interested when testing OLTP applications in Oracle.

Example9c_SystemTap_pread.sh is a scripts that packages a SystemTap probe on pread calls. These are typically the underlying OS calls for single block reads that show up as 'db file sequential read' wait events in Oracle. Note however that Oracle can do single-block reads also using asynchronous, therefore the probes discussed above on block_io trace points are more general than this one.

Examples 9e, 9f: the probes in these scripts measure block_io hooking on different probe points and are provided for reference. Note, these scripts require to have kernel debuginfo symbols.

oracle_event_latency.stp: this script measures the latency of Oracle wait events and produces histograms by hooking directly into the Oracle executable. More details on how this works can be found at the blog entry "SystemTap into Oracle".

Example:

Figure 1 here below is an example the output from PyLatencyMap run using the script Example9b_SystemTap_blockIO_filter_read_8KB.sh. The measured workload has been produced with SLOB (at increasing load). The top heat map represents the number of I/O operations per latency bucket and as a function of time. On the horizontal axis you can fine the time, on the vertical axis the latency buckets. The tool assumes a power-of-2 scale. Color is used to represent the number of I/O operations per second. A blue palette is used for the frequency heat map.
The  heat map on the lower half of the figure uses a yellow-red palette to represent the amount of time waited in each bucket. I refer to it as a intensity/importance heat map. It highlights the relative weight of latency time spent on each bucket and helps in comparing the impact of a large number of low-latency operations vs. a smaller number of high latency operations.


Figure 1:  The output of PyLatencyMap contains the latency histogram of the latest measured point and  two heat maps: on top the heat map of the IOPS, on the lower graph the heat map of the waited time.

Note: for SystemTap (and DTrace) data sources, the latency bucket has to be interpreted as follows: a value of N in bucket B, means N I/O operations with latency between B and 2*B.
Other histogram latency data sources (notably Oracle) use a different convention. With the same example a value of N in a latency bucket B for Oracle histograms means N waits of latency between B and B/2. PyLatencyMap takes care of this difference when calculating the time waited histograms (i.e. the
Note: The scripts discussed here for block I/O tracing use the trace point block_rq_issue, it is also possible to use the trace point block_rq_insert to measure I/O latency from the point of insertion into the queue instead.

References and download links:

PyLatencyMap can be downloaded from this link or can be forked from Github. Additional information on PyLatencyMap is also available at this blog entry and also at this other blog entry.

Conclusions and acknowledgements

SystemTap is a very powerful tool for measuring I/O and producing latency histograms for storage performance investigations. PyLatencyMap provides an easy way to visualize histograms produced by SystemTap as latency heat maps. PyLatencyMap collects data and displays heat maps using the command line interface on character terminals with ANSI escape codes. This provides the combined benefit of a graphical output and the power of the command line. PyLatencyMap can also be used to visualize heat maps from latency histograms produced with DTrace, Oracle's SQL*plus and/or a from latency data aggregated from text and trace files.

Many thanks go to Brendan Gregg for sharing in his blog many awesome ideas and tools on the topic of heat map visualization and on using DTrace and SystemTap that have provided a great source of inspiration for this work.

Note: this entry was originally posted in March 2015, with major updates in July 2015.