OraLatencyMap v1.1 is now available (see also the GitHub repository) with a few new features and bug fixes (v1.0 is described here). Many thanks to all who have tried it already and left a note either on the blog or twitter.
The main new feature in v1.1 is an advanced mode allowing for a few more parameters and customization: the number of samples displayed in the map, the number of latency buckets and the possibility limit data collection to a subset of instances (this is relevant for RAC).
Another new feature is that we now display the maximum value of the calculated sum of the displayed values (i.e. the sum of N# of wait events per second and the sum of time waited). This is intended to help with identifying the peak performance values (for example maximum number of IOPS).
README:
OraLatencyMap, a performance widget to visualize Oracle I/O latency using Heat Maps
Luca.Canali@cern.ch, v1.1, May 2013
Credits: Brendan Gregg for "Visualizing System Latency", Communications of the ACM, July 2010, Tanel Poder (snapper, moats, sqlplus and color), Marcin Przepiorowski (topass)
Notes: These scripts need to be run from sqlplus from a terminal supporting ANSI escape codes.
Better not use rlwrap when running this, or graphics smoothness will suffer.
Run from a privileged user (select on v$event_histogram and execute on dbms_lock.sleep)
Tested on 11.2.0.3, Linux x86_64.
How to start:
sqlplus / as sysdba
SQL> @OraLatencyMap
More examples:
SQL> @OraLatencyMap_event 3 "log file sync"
SQL> @OraLatencyMap_advanced 5 "db file sequential read" 12 80 "and inst_id=1"
Output: 2 latency heat maps of the given wait event
The top map represents the number of waits per second and per latency bucket
The bottom map represented the estimated time waited per second and per latency bucket
with the advanced script it is possible to customize sampling time, event name, screen size
moreover in RAC, the default is to aggregate histogram data over all nodes, but this is customizable too
Scope: Performance investigations of wait events' latency. For example single block read latency with OraLatencyMap.sql
Related: OraLatencyMap_advanced.sql -> this is the main script for generic investigation of event latency with heat maps
OraLatencyMap_event.sql -> another script based on OraLatencyMap_advanced
OraLatencyMap_internal.sql -> the slave script where all the computation and visualization is done
OraLatencyMap_internal_loop.sql -> the slave script that runs several dozens of iterations of the tool's engine
OraLatencyMap and storage testing with SLOB
OraLatencyMap was originally written for troubleshooting and drilling down issues with production DBs. I find that OraLatencyMap can be of help also in the context of storage testing (say for example when installing a new system or evaluating a new storage infrastructure).SLOB 2 by Kevin Closson is a solid reference and overall a great tool for testing storage with Oracle and in particular for testing random I/O activity. Therefore I have used SLOB to drive the workload for the examples here below.
The outline of this simple test: (1) generate test data with SLOB, (2) run the SLOB test for read-only random IO with increasing load values,(3) run OraLatencyMap while the test is running (focus on IOPS and latency values).
The picture here below shows the output of OraLatencyMap taken during 4 different run of SLOB for increasing load (see also slob.conf below and annotations on the graph).
The measured workload is almost entirely dominated by wait event of the type "db file sequential read", that is for random single-block read.
We can see that by increasing the load (number of concurrent SLOB sessions) we can drive more IOPS out of our storage. At the same time we observe that the latency is increasing with increasing load.
How to read IOPS with OraLatencyMap? The sum of the number of waits per second is the metric to look at. I have copied measured values for IOPS as annotations in the figure here below.
The storage system under test is a simple JBOD configuration of 2 storage arrays with 12 SAS 10K rpm disks per array. The storage is connected to the servers via Fiber Channel (8 Gbps). The database is Oracle 11.2.0.3 for Linux x86_64 with ASM. Storage is allocated on a normal redundancy disk group built with 23 disks from the 2 storage arrays.
Note also that after each run SLOB 2 produces a series of reports (including AWR and iostat) with much more information on the workload that what is available by just observing OraLatencyMap output.
SQL> @OraLatencyMap_advanced 10 "db file sequential read" 11 110 ""
A potential pitfall when testing storage is to run our tests with too little data and in particular to have test data that fit in the controller's cache. The figure here below shows just an example of that. The test data there were easily cached by the arrays (4 GB in total for this system). The net outcome is that we have very high figures for IOPS that just don't make sense with the number and type of disks we have.
Indeed the measured latency values confirm that we are mostly reading from cache: we see that the majority of the measured wait events are in the 1 ms latency bucket (wait time of 1 ms or less).
Note on the test: the main difference between this test and the test described above is in amount of data used. The SLOB parameter SCALE = 10000 for this test, SCALE = 1000000 for the test discussed above.
SQL> @OraLatencyMap
Notes: slob.conf and other details regarding the tests. See SLOB 2 manual for more info on the meaning of the parameters.
slob.conf:
UPDATE_PCT=0
RUN_TIME=200
WORK_LOOP=0
SCALE=1000000 #for test N.2 this is scaled down to 10000
WORK_UNIT=256
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=8
SHARED_DATA_MODULUS=0
How to create test data:
./setup.sh SLOB 128 #this needs about 1TB of space in the SLOB tablespace
Relevant init.ora parameters to force Oracle not to use prefetching/batching optimizations:
db_cache_size=50m
_db_block_prefetch_limit = 0
_db_block_prefetch_quota = 0
_db_file_noncontig_mblock_read_count = 0