Monday, June 24, 2013

DTrace Explorations of Oracle Wait Events on Linux and Solaris

DTrace is a great tool to measure and investigate latency for performance troubleshooting. DTrace is now coming to the Linux platform too and I would like to share a few tests I did with it. The following is meant to be a short technology exploration but hopefully it can also highlight what I believe is an area of great potential for performance investigations and troubleshooting of Oracle workloads, namely the integration of DTrace scripts and Oracle's wait events interface.

1 - Warmup: DTrace can probe OS calls coming from Oracle workload. The following is a standard DTrace script to measure the latency of I/O (synchronous) calls. The code here is modified for Linux x86-64 and it works by tracing pread64 system calls. This makes sense when using ASM (without asmlib) and performing single block reads (e.g. waiting for db file sequential read events in Oracle). In Solaris we would trace  pread calls  instead. The use of the DTrace function quantize allows to aggregate data in histograms, similar to Oracle's v$event_histogram. Finally the values are printed every 10 seconds (and counters reset to zero). Simple, elegant, effective!

dtrace -n '
syscall::pread64:entry { self->s = timestamp; } 
syscall::pread64:return /self->s/ { @pread["ns"] = quantize(timestamp -self->s); self->s = 0; }
tick-10s {
printa(@pread);
trunc(@pread);
}'


2 - DTrace meets Oracle. DTrace can also trace user processes, using with the pid Provider. It can trace function calls, see which arguments are passed to them and what is returned. There is an enormous potential to use this to investigate performance issues with Oracle. Let's see here how we can "tap" into Oracle's wait event instrumentation to link DTrace to wait events.
With a little experimentation with the tools described below in the article we find that in 11.2.0.3 Oracle calls kews_update_wait_time with the elapsed time in micro seconds as arg1. Moreover when kskthewt is called, arg1 is the wait event number (to be resolved queriying v$event_name). We can write the following as a test to print all wait events for a given Oracle process (OS process id is 1737 in this example):

# this is for Oracle 11.2.0.3, Linux and Solaris: trace os process pid=1737 (edit os pid as relevant)
dtrace -n '
pid1737:oracle:kews_update_wait_time:entry {
     self->ela_time = arg1;
}
pid1737:oracle:kskthewt:entry {
     printf("event#= %d, time(mu sec) = %d", arg1, self->ela_time);
}'

In the following example we use the same technique to produce event histograms for 2 Oracle processes (2349 and 2350 in the example below). The measured time is in micro seconds, therefore improving accuracy compared to Oracle's v$event_histogram data:

# this is for Oracle 11.2.0.3, Linux and Solaris (edit os pids as relevant)
dtrace  -n '
pid2349:oracle:kews_update_wait_time:entry,pid2350:oracle:kews_update_wait_time:entry {
     self->ela_time = arg1;
}
pid2349:oracle:kskthewt:entry,pid2350:oracle:kskthewt:entry {
     @event_num[arg1] = quantize(self->ela_time);
}
tick-10s {
printa(@event_num);
trunc(@event_num);
}'

Sample output:

 1 294099                        :tick-10s        
             146 (edited: db file sequential read)
           value  ------------- Distribution ------------- count
             128 |                                         0
             256 |@                                        14
             512 |@@@@@                                    120
            1024 |@@@@@@@@                                 182
            2048 |@@@@@@@@@@@@@@                           320
            4096 |@@@@@@@@@@@@@                            302
            8192 |                                         5
           16384 |                                         4
           32768 |                                         0


3 - DTrace can query V$ views. DTrace ability to snoop into user process, and therefore Oracle's engine, extends also to internal data structures. In the following example we use DTrace to read v$session data from the memory structure underlying the X$KSUSE fixed table.
The address for the relevant 'row' of X$KSUSE is taken from arg1 of kslwt_end_snapshot. After that we go and read memory directly triggered by the execution of kskthewt (other choices are possible). Memory offset values for the X$KSUSE fields of interest are port-specific and can be found with a debugger. Here below an example for Linux and Solaris.

# this is for Oracle 11.2.0.3, Linux, tested on OEL 6.4 x86-64 (edit os pid as relevant)
dtrace  -n '
pid1737:oracle:kslwt_end_snapshot:return {
     self->ksuse =arg1;
}
pid1737:oracle:kskthewt:entry {
     KSUSESEQ = *(uint16_t *) copyin(self->ksuse+5536,2);
     KSUSEOPC = *(uint16_t *) copyin(self->ksuse+5538,2);
     KSUSEP1  = *(uint64_t *) copyin(self->ksuse+5544,8);
     KSUSEP2  = *(uint64_t *) copyin(self->ksuse+5552,8);
     KSUSEP3  = *(uint64_t *) copyin(self->ksuse+5560,8);
     KSUSTIM  = *(uint32_t *) copyin(self->ksuse+5568,4);
     SQL_HASH = *(uint32_t *) copyin(self->ksuse+5796,4);
     printf("seq=%u, event#=%u, p1=%u, p2=%u, p3=%u, sql_hash=%u, time=%u", KSUSESEQ, KSUSEOPC, KSUSEP1, KSUSEP2, KSUSEP3, SQL_HASH, KSUSTIM);
}'

Sample output:

pid1737:oracle:kslwt_end_snapshot:return ' matched 2 probes
FUNCTION:NAME
kskthewt:entry seq=6634, event#=146, p1=4, p2=1345795, p3=1, sql_hash=334560939, time=10290
kskthewt:entry seq=6635, event#=445, p1=0, p2=215, p3=245, sql_hash=334560939, time=12
kskthewt:entry seq=6636, event#=197, p1=4, p2=1345796, p3=124, sql_hash=334560939, time=8499
kskthewt:entry seq=6637, event#=197, p1=4, p2=606850, p3=126, sql_hash=334560939, time=9898
kskthewt:entry seq=6640, event#=352, p1=1650815232, p2=1, p3=0, sql_hash=334560939, time=481
kskthewt:entry seq=6641, event#=348, p1=1650815232, p2=1, p3=0, sql_hash=1029988163, time=12

From the sample output here above we can see that DTrace is able to read X$KSUSE (or at least several fields there). This opens the possibility to collect data with extended filters, performs aggregations, etc. For example we can can find this techniques to filter all db file sequential read events for a given file number, or for a given I/O wait time threshold and/or for a given sql_hash value. The possibilities are many and worth further explorations.
Note for db file sequential read: p1 = file number, p2 = block number, p3 = number of blocks (see also v$event_name).

Same script, but this time for Solaris:

# this is for Oracle 11.2.0.3, tested on Solaris 11 x86-64
dtrace -p 1737 -n '
pid$target:oracle:kslwt_end_snapshot:return {
     self->ksuse =arg1;
}
pid$target:oracle:kskthewt:entry {
     KSUSESEQ = *(uint16_t *) copyin(self->ksuse+5592,2);
     KSUSEOPC = *(uint16_t *) copyin(self->ksuse+5594,2);
     KSUSEP1  = *(uint64_t *) copyin(self->ksuse+5600,8);
     KSUSEP2  = *(uint64_t *) copyin(self->ksuse+5608,8);
     KSUSEP3  = *(uint64_t *) copyin(self->ksuse+5616,8);
     KSUSEPOBJ  = *(uint64_t *) copyin(self->ksuse+6232,8);     
     KSUSTIM  = *(uint32_t *) copyin(self->ksuse+5624,4);
     printf("seq=%u, event#=%u, p1=%u, p2=%u, p3=%u, obj#=%u, time=%u", KSUSESEQ, KSUSEOPC, KSUSEP1, KSUSEP2, KSUSEP3, KSUSEPOBJ, KSUSTIM);
}'

Tools, resources and credits: A great tool for investigating Oracle calls with DTrace is digger by Alexander Anokhin and Brendan Gregg. It works under Solaris (I could not make it work under Linux port yet) and allows to trace all function calls and their arguments. Digger makes quite easy to see for example where Oracle instruments wait events by tracing: ./digger.sh -F -p <os pid> -f '*wt*'
Another great resource to get ideas on how to trace Oracle function is Frits Hoogland's blog and the article Profiling of Oracle function calls
Tanel Poder has made a great video on DTracing Oracle query execution engine, available on Enkitec TV.
Brendan Gregg is a reference when talking about DTrace and systems performance tuning, see his website and blog. See also Oracle's DTrace User Guide.
Paul Fox is the author of a port of DTrace for Linux (the other being Oracle's of course):  gihub repository.

DTrace lab environment for Oracle: I have used a test environment built with VirtualBox and installed with Oracle Linux 6.4. As DTrace port for these tests I have used dtrace4linux.
Note: in my Linux-based DTrace test environment I often get the error message "dtrace: processing aborted: Abort due to systemic unresponsiveness". This can be worked around by adding the -w flag when running DTrace scripts, that is by allowing destructive actions.
It's also beneficial to have a Solaris test environment, where DTrace is more mature. In my tests this was installed under VirtualBox too.

Conclusions
DTrace is a powerful tool to explore systems performance. In addition DTrace can explore and snoop inside the workings of user-space programs. This opens the door to many uses for Oracle database investigations, troubleshooting and performance tuning. This short article just scratches the surface of what is possible when combining DTrace with Oracle's wait events interface. The author believes this is an area that deserves further exploration. The Linux port used for these tests is maintained by Paul Fox (Oracle's port does not have the PID provider at the time of this writing). However it is worth noting that this port is not ready for production usage.

Thursday, May 30, 2013

OraLatencyMap v1.1 and Testing I/O with SLOB 2

Topic: OraLatencyMap v1.1 is an updated versions of a performance tool aimed at collecting and displaying Oracle wait event histogram data as latency heat maps. We will also briefly discuss an example of the usage of OraLatencyMap in the context of storage testing.

OraLatencyMap v1.1 is now available (see also the GitHub repository) with a few new features and bug fixes (v1.0 is described here). Many thanks to all  who have tried it already and left a note either on the blog or twitter.

The main new feature in v1.1 is an advanced mode allowing for a few more parameters and customization: the number of samples displayed in the map, the number of latency buckets and the possibility limit data collection to a subset of instances (this is relevant for RAC).

Another new feature is that we now display the maximum value of the calculated sum of the displayed values (i.e. the sum of N# of wait events per second and the sum of time waited). This is intended to help with identifying the peak performance values (for example maximum number of IOPS).

README:
OraLatencyMap, a performance widget to visualize Oracle I/O latency using Heat Maps
Luca.Canali@cern.ch, v1.1, May 2013

Credits: Brendan Gregg for "Visualizing System Latency", Communications of the ACM, July 2010, Tanel Poder (snapper, moats, sqlplus and color), Marcin Przepiorowski (topass)

Notes: These scripts need to be run from sqlplus from a terminal supporting ANSI escape codes. 
       Better not use rlwrap when running this, or graphics smoothness will suffer.
       Run from a privileged user (select on v$event_histogram and execute on dbms_lock.sleep)
       Tested on 11.2.0.3, Linux x86_64.

How to start:
      sqlplus / as sysdba
      SQL> @OraLatencyMap 

More examples:
      SQL> @OraLatencyMap_event 3 "log file sync"
      SQL> @OraLatencyMap_advanced 5 "db file sequential read" 12 80 "and inst_id=1"

Output: 2 latency heat maps of the given wait event
         The top map represents the number of waits per second and per latency bucket
         The bottom map represented the estimated time waited per second and per latency bucket
         with the advanced script it is possible to customize sampling time, event name, screen size 
         moreover in RAC, the default is to aggregate histogram data over all nodes, but this is customizable too

Scope:  Performance investigations of wait events' latency. For example single block read latency with OraLatencyMap.sql
         
Related: OraLatencyMap_advanced.sql      -> this is the main script for generic investigation of event latency with heat maps
         OraLatencyMap_event.sql         -> another script based on OraLatencyMap_advanced 
         OraLatencyMap_internal.sql      -> the slave script where all the computation and visualization is done
         OraLatencyMap_internal_loop.sql -> the slave script that runs several dozens of iterations of the tool's engine 

OraLatencyMap and storage testing with SLOB

OraLatencyMap was originally written for troubleshooting and drilling down issues with production DBs. I find that OraLatencyMap can be of help also in the context of storage testing (say for example when installing a new system or evaluating a new storage infrastructure).
SLOB 2 by Kevin Closson is a solid reference and overall a great tool for testing storage with Oracle and in particular for testing random I/O activity. Therefore I have used SLOB to drive the workload for the examples here below.

The outline of this simple test: (1) generate test data with SLOB, (2) run the SLOB test for read-only random IO with increasing load values,(3) run OraLatencyMap while the test is running (focus on IOPS and latency values).
The picture here below shows the output of OraLatencyMap taken during 4 different run of SLOB for increasing load (see also slob.conf below and annotations on the graph).
The measured workload is almost entirely dominated by wait event of the type "db file sequential read", that is for random single-block read.
We can see that by increasing the load (number of concurrent SLOB sessions) we can drive more IOPS out of our storage. At the same time we observe that the latency is increasing with increasing load.
How to read IOPS with OraLatencyMap? The sum of the number of waits per second is the metric to look at. I have copied measured values for IOPS as annotations in the figure here below.

The storage system under test is a simple JBOD configuration of 2 storage arrays with 12 SAS 10K rpm disks per array. The storage is connected to the servers via Fiber Channel (8 Gbps). The database is Oracle 11.2.0.3 for Linux x86_64 with ASM. Storage is allocated on a normal redundancy disk group built with 23 disks from the 2 storage arrays.

Why is this useful? First of all it's a quick and easy way to start investigations of the storage. Single block random read latency is very important for many OLTP applications. We can therefore learn about the latency we can expect from the storage at different loads. We can learn about the maximum IOPS, and overall see the behavior at storage saturation for this type of workload.
Note also that after each run SLOB 2 produces a series of reports (including AWR and iostat) with much more information on the workload that what is available by just observing OraLatencyMap output.

Coming back to the example of the JBOD configuration we can see from the figure below that the measured values for IOPS are consistent with expectations: each disk delivering ~200 IOPS. This is consistent with other measurements previously done on the same system, see also this presentation. The measured latency is in the range of 4-8 ms for low load and starts to increase considerably when we start to drive the disks closer to maximum IOPS, also something that is expected.

SQL> @OraLatencyMap_advanced 10 "db file sequential read" 11 110 ""



A potential pitfall when testing storage is to run our tests with too little data and in particular to have test data that fit in the controller's cache. The figure here below shows just an example of that. The test data there were easily cached by the arrays (4 GB in total for this system). The net outcome is that we have very high figures for IOPS that just don't make sense with the number and type of disks we have.
Indeed the measured latency values confirm that we are mostly reading from cache: we see that the majority of the measured wait events are in the 1 ms latency bucket (wait time of 1 ms or less).
Note on the test: the main difference between this test and the test described above is in amount of data used. The SLOB parameter SCALE = 10000 for this test, SCALE = 1000000 for the test discussed above.

Comment: the example described here is quite basic, however it is the case that many storage arrays these days come with large amounts of SSD cache. It is important to understand/measure if test data fit in the cache to make sense of the results of the stress tests.

SQL> @OraLatencyMap



Notes: slob.conf and other details regarding the tests. See SLOB 2 manual for more info on the meaning of the parameters.

slob.conf:
UPDATE_PCT=0
RUN_TIME=200
WORK_LOOP=0
SCALE=1000000   #for test N.2 this is scaled down to 10000
WORK_UNIT=256
REDO_STRESS=HEAVY
LOAD_PARALLEL_DEGREE=8
SHARED_DATA_MODULUS=0

How to create test data:
./setup.sh SLOB 128  #this needs about 1TB of space in the SLOB tablespace

Relevant init.ora parameters to force Oracle not to use prefetching/batching optimizations:
db_cache_size=50m
_db_block_prefetch_limit = 0
_db_block_prefetch_quota = 0
_db_file_noncontig_mblock_read_count = 0

Conclusions

OraLatencyMap is a tool for measuring and displaying wait event histogram data in Oracle as latency heat maps. The tool can be used to troubleshoot production issues related to storage latency. OraLatencyMap can be of assistance when testing storage together with Oracle-based stress testing tools such as Kevin Closson's SLOB (see http://kevinclosson.net/slob/).

Wednesday, May 8, 2013

Oracle Events' Latency Visualization and Heat Maps in SQL*plus

Topic: This post is about a technique for Oracle performance tuning, the use of heat maps to investigate wait event latency (and in particular I/O-related latency). This post also discusses a SQL*plus-based script/tool I have developed to help with this type of monitoring and performance drill-down (OraLatencyMap).

Context: Oracle exposes latency data for the wait event interface in  V$EVENT_HISTOGRAM. This gives an additional dimension to drill down performance data for analysis and troubleshooting. In an older blog post I described an example of troubleshooting a storage issue for an OLTP (RAC) database by investigating the histogram data of the 'db file sequential read' wait event. In that context I had also developed and discussed ehm.sql, a simple PL/SQL script to collect and display data from GV$EVENT_HISTOGRAM.

What's new: An excellent article by Brendan Gregg, "Visualizing System Latency", Communications of the ACM, July 2010 has inspired me to develop an enhanced version of ehm.sql. The idea is to display in real time data of current and historical values of selected wait event histograms. In particular I/O-related events such as db file sequential read and log file sync make excellent candidates for this type of analysis. Moreover those events are relevant in a context that is familiar to me, that is drilling down issues with OLTP performance and access to storage.
As Brendan shows in his article, I/O latency data fits naturally to heat map representation, where time is on the horizontal axis, latency buckets are on the vertical axis and the quantity to display (for example number of waits or time waited) is displayed as color (hence the name heat map).

The tool: OraLatencyMap is a tool I have developed to help extract and represent event histogram data in a heat map. It is intended to be lightweight and 'DBA-friendly'. It's implemented in PL/SQL, it does not require the creation of any DB objects and runs under SQL*plus. OraLatencyMap requires a terminal supporting ANSI escape codes (for example PuTTY, MobaXterm, xterm, notably it does not run under windows' cmd.exe). Making SQL*plus behave like a monitoring tool, requires jumping through hoops. Credits to the experts who have shared their results in this area and thus made my job much easier here. In particular: Tanel Poder (for moats, sqlplus and color, snapper, etc..), Adrian Billington (moats) and Marcin Przepiorowski (topaas).

Example 1: study of db file sequential read
See here below a screen shot of a putty terminal where I ran SQL*plus and @OraLatencyMap. The script samples GV$EVENT_HISTOGRAM roughly every 3 seconds and displays 2 heat maps. The top heat map gives information on the number of waits per second on each latency bucket. The bottom heat map instead represents the estimated wait time per latency bucket. The two graphs represent the same type of information but with 2 different 'points of view'.

SQL> @OraLatencyMap


This type of data is useful when investigating single block read latency issues in OLTP systems for example. I'll leave for another time a discussion of the details and limitations of this approach.  I'll just point out that among others, it's also important to make sure the system is not starving with CPU to make sense of the data (CPU data not shown here). When reading the heat map I typically focus on 3 areas: one is low-latency (1ms) where I get info on what is most likely reads from storage cache, the second is the area of latency around the 16 and 32 ms buckets, most likely representing physical reads from 'rotating disks'. The third very interesting area to watch is the 'high latency' (>100ms), that is the area of "IO latency outliers" and can be sign of problems with the storage for example. Note that OraLatencyMap is a drill-down tool based on Oracle instrumentation so the interpretation of the results, especially when extended to storage, will often need additional data from the specific context being investigated. 

Example 2: study of log file sync
This example is about visualizing the latency of log file sync. This can be useful when drilling down commit latency issues. This is a critical area for many OLTP systems and the troubleshooting is often not easy: storage performance, server CPU starvation and unexpected behavior by LGWR, among others can all potentially cause problems in this area.
This screen shot was taken from a putty window, this time with a white background.

SQL> @OraLatencyMap_event 3 "log file sync"


Note: See also the blog entry on OraLatencyMap v1.1

Conclusions
Latency investigations of Oracle wait events give an additional and powerful dimension to performance tuning and troubleshooting. This is particularly useful for (synchronous) I/O-related wait events such db file sequential read and log file sync. Latency heat maps are a particularly suited to visualize IO latency (see also Brendan Gregg's article). The first version of a simple SQL*plus script (OraLatencyMap) to collect and visualize event histogram data as heat maps has been discussed.
OraLatencyMap is available for download from http://canali.web.cern.ch/canali/resources.htm and on GitHub

Monday, March 25, 2013

Testing Lost Writes with Oracle and Data Guard

Topic: This post is about lost writes in Oracle, on techniques for reproducing and investigating the effects of lost writes and on the use of Active Data Guard to protect against lost writes

Motivations. Imagine this scenario: a production system has two standbys to protect against disaster and to load balance read-only load (with Active Data Guard (ADG)). A lost write happens in the primary and remains unnoticed for a few days. Finally the block that suffered a lost write is updated again. Both standbys stop applying redo throwing ORA-600 [3020] (also known as stuck recovery). The primary DB keeps working fine, although it is logically corrupted by the lost write.
You are the DBA in charge of fixing this, what would you do?

The importance of testing: I hope that the example above illustrates that lost write can generate quite complex recovery scenarios and overall a few headaches to support DBAs. In this post I illustrate a few techniques and examples that can be used to test the effects of lost writes in Oracle and therefore prepare in case of a real-world issue strikes. Of particular interest will be to test the effect of lost writes in an environment with (Active) Data Guard.
We need to cover some ground first on techniques and how to setup the test. But first a definition.

Lost writes: "A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage" (from support note 1302539.1). Lost writes can be caused by faulty storage, but also by Oracle bugs, in general anything in between our data in RAM and storage can corrupt our data, including controllers and network.

Digression on techniques.
1. A useful technique that we will need in the following is the ability to read and write a single block from Oracle data files (in a test environment). For databases on filesystems (and also DBs on NFS) dd is the tool for this job (I am using the Linux platform as reference). Examples:

read one 8KB block from filesystem (block 134 in this example):
dd if=testlostwrite.dbf bs=8192 count=1 skip=134 of=blk134.dmp 

write one 8KB block to filesystem (block 134 in this example)::
dd of=testlostwrite.dbf bs=8192 count=1 seek=134 if=blk134.dmp conv=notrunc

Note when writing to an Oracle datafile we must use conv=notrunc or else we will end up with an unusable (truncated) output file. Note also the syntax for specifying the block offset, skip is used for input files and seek for output files (see dd manual).

How to read and write single blocks on ASM data files. One possibility is to take a backup copy of the datafile with RMAN, edit it with dd (as detailed above), then (with rman again) restore the backup copy. With a little knowledge of ASM internals, more direct ways to access files in ASM are available: one can find the position of the block (and its mirror copies if relevant) and then use dd to read/write data directly. Currently my preferred way is sightly different and it exploits the dbms_diskgroup package. This is an undocumented package (see again the link above on ASM internals for some additional details) although it is extensively used by Oracle's asmcmd utility. I have packaged the dbms_diskgroup.read and dbms_diskgroup.write calls into a small utility written in perl (that I called asmblk_edit, follow this link to download asmblk_edit). Similar ideas can be found also in support note 603962.1.
The following example illustrates using the utility asmblk_edit to read and write block number 134 from and to a data files stored in ASM:

read one 8KB block from an Oracle data file in ASM (block 134 in this example):
./asmblk_edit -r -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp

write one 8KB block to and Oracle data file in ASM (block 134 in this example):
./asmblk_edit -w -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp

2. Another technique that we need is quite straightforward and allows us to find the offset of the Oracle block that we want to read/write to for our tests.
The example here below shows how to find block number where data is stored, using rowid:
SQL> select rowid, dbms_rowid.ROWID_BLOCK_NUMBER(rowid), a.* from testlosttable a;

Incidentally finding the block number for an index leaf block can be done with the following (using the undocumented function sys_op_lbid):
SQL> select rowid rowid_table_from_index_leaf, sys_op_lbid(18297477,'L',t.rowid) index_leaf_rowid from testlosttable t --note: 18297477 in this example is the data_object_id of the index I am examining, edit with the actual number as relevant

3. Last but not least, we need a way to modify data blocks 'under the nose of Oracle', in particular we want to make sure we flush/invalidate the relevant cached data and metadata. The method we will use is:
  • Offline the tablespace where data resides (this flushes dirty blocks and invalidates cache entries)
  • Perform the read/write modifications to the block, with dd or asmblk_edit, as needed
  • Online the tablespace again before the next usage

A basic recipe to reproduce the effects of a lost write. 
We can now put together the ideas and techniques described above into a working example aimed at reproducing the effects of a lost write in Oracle:

SQL> create bigfile tablespace testlostwrite datafile '{edit with datafile directory}/testlostwrite.dbf' size 10m;
SQL> create table testlosttable (id number, payload varchar2(100)) tablespace testlostwrite ;
SQL> create index i_testlosttable on testlosttable (id) tablespace testlostwrite ;
SQL> insert into testlosttable values (10,'aaaaaaaaaaaaaaaaaaaaaaaaa');
SQL> insert into testlosttable values (20,'bbbbbbbbbbbbbbbbbbbbbbbbb');
SQL> commit;
SQL> select rowid, dbms_rowid.ROWID_BLOCK_NUMBER(rowid), a.* from testlosttable a;
--note: this will allow to find the block_id where data resides, let's say it's block 134

SQL> alter tablespace testlostwrite offline;
-- read block either with dd or with asmblk_edit and create backup copy. Example:
-- ./asmblk_edit -r -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp
SQL> alter tablespace testlostwrite online;
SQL> insert into testlosttable values (30,'cccccccccccccccccccccccccc');
SQL> commit;
SQL> alter tablespace testlostwrite offline;
-- write block either with dd or with asmblk_edit and from previously created backup copy. Example:
-- ./asmblk_edit -w -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp
SQL> alter tablespace testlostwrite online;
SQL> -- our database has now a lost write in the table testlosttable block 134

The effect of the lost write on the table is that the row with id=30 has disappeared from the table. However the entry with id=30 is still visible in the index i_testlosttable. This observation can be confirmed with  the 2 queries reported here below. Note that in case of normal operations (i.e. no lost writes) the 2 queries would both return three rows, this is not the case here because of our manual editing of the table block with the asmblk utility.

SQL> select /*+ INDEX_FFS(a)*/ id from testlosttable a where id is not null;
SQL> select  /*+ FULL(a)*/ id from testlosttable a;
SQL> -- these 2 queries return different results in our test with a lost write.. and only one of them is correct!

We can now proceed with this additional test: insert a new row into the table, for example:

SQL> insert into testlosttable values (40,'ddddddddddddddddddddddd');
SQL> commit;

What we should notice notice at this point is that Oracle keeps working fine and no errors are shown to the user. With the techniques discussed above we can easily show that this new row has  been inserted into block 134 (the block with a lost write). Let's postpone further investigations for a later paragraph and for now just note that Oracle has not detected that we have suffered a lost write.

An example of the effects of lost writes with Standby (Data Guard)
A standby database, among others, provides a copy of the database that Oracle can use to detect lost writes (i.e. the standby provides a reference copy of 'good data'). A simple test to see how this works in practice:
  • We start by going through the same steps described above and we create a lost write in the primary. 
    • Note that at this point the standby has no knowledge that something has gone wrong in the primary.
  • Now we can go ahead and run DML against the block that has suffered the lost write 
    • i.e. we insert the row with id=40 as detailed above.
  • At the moment when the standby database will try to apply the redo log entry (change vector) to the block with a lost write, it will compare SCN numbers and find that something is wrong. MRP will stop and throw ORA-600 [3020] (stuck recovery).
An example of the error stack for a stuck recovery:

ORA-00600: internal error code, arguments: [3020], [10], [134], [134], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 10, block# 134, file offset is 1097728 bytes)
ORA-10564: tablespace TESTLOSTWRITE
ORA-01110: data file 10: '+STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 18299147

Additional information is dumped in the trace files of MRP and of the recovery slaves.
It's worth noting that ORA-600 [3020] can be caused by a lost write (as in this example) but also by other issues, in particular with the redo consistency (outside the scope of this discussion).

Oracle 11g and lost write protection 
In the example above Oracle only throws an error alerting that something is wrong (in this case ORA-600 [3020], stuck recovery) when a DML operation is performed on the primary DB against a block that had suffered a lost write. This means that lost writes may also remain silent and unnoticed in the primary DB for a very long period of time. A new feature of 11g can be used to make Oracle more active in detecting lost write issues, in particular to check the blocks that are subject to physical reads into the buffer cache. This is how to activate these checks:
  • set the initialization parameter DB_LOST_WRITE_PROTECT = TYPICAL (or FULL if we prefer) on the primary database. This will cause the generation of additional redo entries when Oracle performs physical reads. 
  • set DB_LOST_WRITE_PROTECT = TYPICAL also on the standby, this will make MRP and its recovery slaves to check for lost writes using the extra information in the redo log stream. 
  • Outcome: if we hit a block suffering from lost write, MRP and its slaves will stop and throw ORA-752: recovery detected a lost write of a data block 
An example of the error stack for lost writes with db_lost_write_protect set to typical is pasted here below (note that additional information is dumped in the trace files of MRP and of the recovery slaves):

Hex dump of (file 10, block 134) in trace file {path..}/standby_pr09_26471.trc
Reading datafile '+STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685' for corruption at rdba: 0x00000086 (file 10, block 134)
Read datafile mirror 'STDBY_DATADG1_0000' (file 10, block 134) found same corrupt data (logically corrupt)
Read datafile mirror 'STDBY_DATADG1_0011' (file 10, block 134) found same corrupt data (logically corrupt)
STANDBY REDO APPLICATION HAS DETECTED THAT THE PRIMARY DATABASE
LOST A DISK WRITE OF BLOCK 134, FILE 10
NO REDO AT OR AFTER SCN 6367748422450 CAN BE USED FOR RECOVERY.
ORA-00752: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 10, block# 134, file offset is 1097728 bytes)
ORA-10564: tablespace TESTLOSTWRITE
ORA-01110: data file 10: '+STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 18299538

Comparing this error message with the ORA-600 [3020] reported in the previous paragraph,the first thing we notice is that we have more information and more details. Moreover Oracle has checked that this is indeed a case of lost write and has even tried to resolve the 'logical corruption' trying to read the second mirror copy from the normal-redundancy ASM diskgroup used in this example (an attempt that did not help in this case as both mirror copies of the block have suffered a lost write, as generated in our example using the asmblk utility).

DB_LOST_WRITE_PROTECT and its impact on primary and standby DBs
From the example here above we see that the extra checks that oracle recovery process does when setting DB_LOST_WRITE_PROTECT with Oracle 11g are definitely an improvement from 10g behavior, although this mechanism does not provide a complete protection against lost writes but only gives us a higher probability that a lost write can be found.
How about the impact of setting this parameter? Extra redo entries are generated on the primary: they are called block read redo (BRR). We can directly investigate BRR entries for example by dumping redo (BRR corresponds to redo at layer 23 and opcode 2), see this example on how to do this type of dump.
SQL> alter system dump logfile '...{path to storage}.../thread_2_seq_1816.510.810661967' layer 23 opcode 2;

Another method is by querying v$mystat or v$sysstat. We will look for stats containing the text 'lost write' in their name:
SQL> select name, sum(value) from v$mystat se, v$statname n where n.statistic#=se.statistic# and (n.name like '%lost write%' or name like '%physical read%') group by name;

The size of BRR entries in the redo stream varies as Oracle's optimizations can come into play, such as batching several BRR entries in one redo record. Based on a limited number of observations of a production system I estimate than on average we can expect 30 bytes of extra redo generated by BRR for each physical block read performed by the database, although mileage may vary and definitely the impact of the parameter should be tested before applying it to a busy production system! Another observation based on testing is that direct read operations do not generate BRR (for example reads for parallel query).

Example of BRR entry from a logfile dump:

CHANGE #5 TYP:0 CLS:4 AFN:10 DBA:0x00000086 OBJ:18299943 SCN:0x05ca.9b6e95f9 SEQ:1 OP:23.2 ENC:0 RBL:1
 Block Read - afn: 10 rdba: 0x00000086 BFT:(1024,134) non-BFT:(0,134)
              scn: 0x05ca.9b6e95f9 seq: 0x01
              flags: 0x00000004 ( ckval )

After setting db_lost_write_protect to typical (or full) and rebooting the instance Oracle will set _log_committime_block_cleanout to true (based on observations on my test system 11.2.0.3). The effect is that (on the primary instances) Oracle will write into the redo logs additional entries related to commit time block cleanout operations (layer 4 opcode 6). Logging block cleanout redo record (and subsequently applying them to the standby database) is a desirable feature especially in the case of an environment deploying Active Data Guard.

On a standby instance Oracle setting db_lost_write_protect to  TYPICAL or FULL will make MRP slaves do the extra work of checking for lost writes by comparing SCNs in BRRs with SCN in the block headers. Notably the extra work implies additional physical reads on the standby. This additional work on the standby can be monitored by querying v$sysstat (statistics name = 'recovery blocks read for lost write detection' and also the related stat 'recovery blocks skipped lost write checks').

Note: checks for lost write based on SCN from BRR entries are performed also when doing media recovery (i.e. a recover database command). Therefore even if Data Guard is not available one can use a simple restore of the database from backup to perform validation of BRR records to search for lost writes. 

Analysis and troubleshooting.
From the steps details above we have a test environment with a lost write that we have generate. This will allow us to have some fun investigating the issue at low level and developing techniques for analysis and troubleshooting. All this with the great advantage, compared to a real-life case, that now we know for sure what the root cause of this incident is!
A good starting point for troubleshooting lost writes in a production system is support note 1265884.1 "Resolving ORA-752 or ORA-600 [3020] During Standby Recovery". A few more pointers here below.

- One technique to investigate the affected blocks both on primary and standby is by dumping the block contents to a trace file. The idea is to compare the contents of the primary and standby. Of particular interest will be  the SCN of the last change to the block and also SCNs in the ITL list. Example:
SQL> alter system dump datafile 10 block 134;
SQL> --the following performs a dump of the cached version of the block 
SQL> alter session set events 'immediate trace name set_tsn_p1 level <ts#+1>'; -- where ts# is the tablespace number
SQL> alter session set events 'immediate trace name buffer level <decimal rdba>'; --rba is 134 in our example

- We can also investigate the contents of the block suffering a lost write using SQL. This has the advantage of allowing the use of flashback query. The current content of the block on the primary database  may not be what we want: we are interested in a consistent image of the block at the SCN equal to the SCN reported in our ORA-752 message for example (this would also be current SCN of the standby as recovery is stuck there at that point in time).  Example:
SQL> set num 16
SQL> select ora_rowscn, rowid, dbms_rowid.rowid_row_number(a.rowid) row_number, a.* from testlosttable as of scn 6367748219413 a where rowid like 'ABFzkJAAAAAAACG%'; -- edit values for scn and rowid, use SQL below to find the values to use

- Digression: how to find the rowid of the block with lost write (block 134 of file 10 and object_id=18299145). Actually what we need is just the first 15 characters of the rowid (the last three characters are the row_number inside the block). Example:
SQL> select substr(DBMS_ROWID.ROWID_CREATE(rowid_type =>1, object_number =>18299145, relative_fno =>0, block_number =>134, row_number =>0),1,15) rowid_prefix from dual;

- How to find the current SCN (at the standby):
SQL> select current_scn from v$database;

- Another powerful technique is to dump redo log information from all relevant logfiles searching for entries related to the block with lost writes (in our example it's block 134 of file 10). The dump will include transaction details and most notably redo marking the time when DBWR has written the give block (this info is sotre in block written redo, BWR). If  the parameter db_lost_write_protect is set to typical or full the redo dump will also show details of the block read redo (see BRR discussed above). For further info on logfile dumps see also Julian Dyke's website. Example:
SQL> alter system dump logfile '...{path to storage}.../thread_2_seq_1816.510.810661967' DBA MIN 10 134 DBA MAX 10 134; -- edit file number and block number as needed

- Additional investigation on data affected by the lost write can be performed by log mining. The goal is to find the SQL of all the transactions for the affected block. Identify the relevant redo logs to mine first. Example:
SQL> BEGIN
SYS.DBMS_LOGMNR.ADD_LOGFILE(LogFileName=>'...{path to storage}.../thread_2_seq_1816.510.810661967',Options=>SYS.DBMS_LOGMNR.NEW);
SYS.DBMS_LOGMNR.START_LOGMNR(Options=> SYS.DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);
END;
/
SQL> SELECT scn,sql_redo FROM SYS.V_$LOGMNR_CONTENTS WHERE data_obj#=18299145 and row_id like 'ABFzkJAAAAAAACG%'; -- calculate rowid with dbms_rowid package as detailed above

- Other analysis that can be done in case we have indexes on the tables:
In case one or more indexes are present we can read data from the index and compare the results with what we have on the table. Example to read from the index:
SQL> select rowid rowid_table_from_index_leaf, id, sys_op_lbid(18299146,'L',t.rowid) index_leaf_rowid from testlosttable t where rowid like 'ABFzkJAAAAAAACG%';
--note: update 18299146 with the the data_object_id of the index of interest 
-- in this example this is the data_object_id of I_TESTLOSTTABLE 

- We can also check if there are inconsistencies (corruption) between the table and its indexes with 'analyze table validate', although in my experience this can be quite time consuming and not necessarily add more information to the analysis. Example:
SQL> analyze table testlosttable validate structure cascade online;

Actions that we can take to restore the services
Suppose that our analysis has confirmed that a lost write happened and also that we have the details of what 'is lost'. We need now to fix the corrupted block on the primary and restore the service on the standby. In particular if we have an Active Data Guard with a SLA, the latter may be quite an urgent action. Hopefully we also have some ideas of what the root cause was and a way to fix it in the future.

One possible action is to failover to standby. This action plan however may provide to be unacceptable in many circumstances, given the potential for data loss it implies. A failover would likely not be acceptable if the primary database has continued working and accepting users transactions since the time (SCN) of the incident that has generated ORA-600 (or ORA-752).

Another possibility is to use our knowledge of the lost transactions gathered in the analysis phase to run SQL actions to 'fix the primary'. This has to be evaluated case by case. In some circumstances we can also get away with just dropping and recreating the object with a lost write. In our simple example of a lost insert on table testlosttable, the action to perform on the primary is:

SQL> alter index i_testlosttable rebuild online;
SQL> insert into testlosttable values (30,'cccccccccccccccccccccccccc');
SQL> commit;

What about fixing the standby? We can unstuck the recovery on the standby by allowing it to corrupt  the block with a lost write (and fix it later, as detailed below). Example:

SQL> alter system set db_lost_write_protect=none; --temporarily disable lost write check if needed
SQL> alter database recover automatic standby database allow 1 corruption;
SQL> --wait till the redo that cause the error message has been applied
SQL> alter database recover cancel;
SQL> -- restart normal Data Guard operations. An example for ADG:
SQL> alter system set db_lost_write_protect=typical;
SQL> alter database open read only;
SQL> alter database recover managed standby database nodelay using current logfile disconnect;

At this point all is OK except for one corrupted block on the standby. How to restore the corrupted block on standby? This depends on the case, we may just rebuild the object on primary this will fix the problem on standby. Another option is to copy over the datafile from primary to standby
In our example the corrupted block is block 134 of file 10 and we have an active data guard in real time apply. We can use automatic block media recovery (ABMR) to fix it. In my tests AMBR is attempted but does not really work against the corrupted block 134, I can work around this by zeroing out the block. This is an example (intended to be used on test databases):

./asmblk_edit -w -s 134 -a +STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685 -f zeroblock
where zeroblock file is created with: dd if=/dev/zero bs=8192 count=1 of=zeroblock 
if my standby was on filesystem I could have used:
dd if=/dev/zero of=testlostwrite.dbf bs=8192 count=1 seek=134 if=blk134.dmp conv=notrunc

If we now query the table (the table testlostwrite in our example) when Oracle reaches the zeroed block it will fetch a copy from production (see Oracle documentation for the details about configuring and using the ABMR feature). This will happen in a transparent way for the user issuing the query, the operation is logged in the alert log of the instance (the 2 lines here below appeared repeated twice in my test using 11.2.0.3):
Automatic block media recovery requested for (file# 10, block# 134)
Automatic block media recovery successful for (file# 10, block# 134)

Oracle's OTN demo video on lost write
Short videos with demos on testing lost writes and automatic block media recovery in a Data Guard environment can be found on OTN.
See also support document "Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration" [ID 1302539.1]

Conclusions and future work
This article illustrates a simple example on how to reproduce and investigate the effects lost write in a test environment, which is intended as a training exercise for DBAs.
I would like to end with four lessons learned on the topic of how lost writes can affect a high-availability Oracle deployment:
  • Finding the root causes of a lost write can prove to be very hard. Two possible sources of the root causes of these type of issues are the storage and Oracle bugs. How to find which one (if any of the two) is to blame for a particular incident? If the issue cannot be re-produced we can easily find ourselves in between a finger-pointing discussion between vendors.
  • Having a standby (Data Guard) is of great help for discovering lost writes. Without a standby there are high chances that we will not realize we had a lost write in the first place. However, once a lost write is found the redo apply stops: the recovery is stuck and throws error ORA-600 [3020] or ORA-752 if we try to restart it. A support DBA most likely will have to analyze the situation and decide what to do to fix the issue (for example perform a failover or rather going through a process similar to what described in this article). Moreover if one or more Active Data Guards are used for critical read-only activity, there is time pressure to restart the redo apply.
  • How can we check that our primary and standby databases are 'in sync', that is that there are no lost writes waiting to 'explode' as time bombs? I am not aware of a utility that could do such check. This is a very interesting topic, probably material for another post. A brief discussion of this problem and possible solutions can be found at this link and also here.
  • Database restores can be affected by lost writes too. For example the restore of a database backup can fail (get stuck on applying redo) because of a lost write that has happened in production after the latest data file backup. This has potential impacts on the disaster and recovery strategy.

Note: the examples reported in this article have been tested against Oracle 11.2.0.3 64 bit for Linux (RHEL5).

Thursday, February 14, 2013

AWR Analytics and Oracle Performance Visualization with PerfSheet4

Topic: This post describes PerfSheet4, a tool for performance analysis aimed at streamlining access and visualization of Oracle's AWR data. The tool is aimed at DBAs and Oracle performance analysts. PerfSheet4 is a spin off of previous original work by Tanel Poder, rewritten and integrated with additional functionality for AWR analysis and with important changes to the user interface.

Note added: An update to this post with the details of a more recent version of PerfSheet4 can be found at this link. See also the work on PerfSheet.js.

Context: There is much information in the counters and metrics of Oracle's AWR that can be of substantial help for troubleshooting and for capacity planning. Besides the standard AWR report, time-based analysis is often very useful. However this type of access is made slow and tedious by a list of tasks, such as, fetching data, calculating incremental changes and producing interesting graphs.This normally requires developing or downloading a bunch of scripts and adding some visualization techniques on top. See for example Karl Arao's scripts and blog.
Tanel Poder has already demonstrated a few years ago that the pivot chart functionality of Excel can be an excellent platform for performance data visualization including AWR (and Perfstat) data. See his work on  Perfsheet 2.0 and PerfSheet3.

What's new: The main drive to develop PerfSheet4 was to (try to) make the user interface as simple as possible, without compromising the general nature of the tool. A set of pre-defined queries for some common AWR analysis come with the tool. The plotting functionality will generate custom graphs for such data, an easy entry point to analysis.
If direct access to the DB is not possible, data can be extracted from the DB with the scripts supplied in the sql*plus_scripts directory and then loaded as csv. Similarly, export to csv can be used to archive data for later analysis.

PerfSheet4 is a free tool and is available for download from GitHub or download from this link. The tool has been developed and tested on Excel 2010 against Oracle 11gR2. See here below a snapshot of the user interface with some additional details.




Why Excel: Pivot tables and charts are a great feature available in Excel that allows to quickly and easily analyze multidimensional data. This can be of great help when troubleshooting problems in production, for example with the need to try out different ideas and data combinations

Why pre-defined graphs: This feature is again to make life easier for the user. The idea is to automates the tedious task of setting up the pivot chart fields and provides an entry point to get to the needed graph.



Available queries: version 3.2p beta comes with the following queries (and pre-defined graphs):

Query name name Short description
Wait events Extracts data from dba_hist_system_event, computes delta values between snapshots and rates (i.e. delta values divided over delta time)
System statistics Extracts data from dba_hist_sysstat, computes delta values between snapshots and rates (i.e. delta values divided over delta time)
Stats per service Extracts data from dba_hist_service_stat, computes delta values between snapshots and rates (i.e. delta values divided over delta time)
Workload data Extracts data from dba_hist_sysmetric_summary, metrics give details on the usage of system resources over time ranges
IO wait latency Extracts data from dba_hist_event_histogram for io related events, computes delta values between snapshots and rates (i.e. delta values divided over delta time)
Top 5 wait per instance Extracts data from dba_hist_system_event and dba_hist_sysstat, computes delta values and rates (delta value over delta time) and selects top 5 non idle events for each instance
Wait events per class Extracts data from dba_hist_system_event, computes delta values and rates (delta value over delta time) aggregating over wait class


Future work: Stabilize Perfsheet4 against bugs and for different execution environments. For this feedback from other users would be very much appreciated. Possibly add more pre-defined queries (for AWR analytics but not necessarily only that). Add a post with examples of how analysis performed with PerfSheet4 have been useful for my job in various occasions. In the mean time if you find the tool useful or find bugs or have ideas for possible improvements, I'll be interested to know! Acknowledgments: Many thanks to Tanel Poder, the co-author of the tool.

Getting started video: http://youtu.be/kC0EJCLZWIw



Wednesday, November 28, 2012

Active Data Guard and UKOUG 2012

I am looking forward to participating again to the UKOUG annual conference (and also to attend the Sunday's OakTable event). This is for me a great opportunity to meet and discuss with many passionate Oracle experts who regularly attend the conference and also to get up-to-date with the latest technology and news from the Oracle community.
My colleague Marcin and I have just finished preparing our presentation on the subject of Active Data Guard. I have enjoyed the work even though, as it is often the case, this has taken quite some effort from both of us to reach the level of details we wanted.
Our goal with this presentation is to share our thoughts and experience on Active Data Guard as we have seen it mature in our production environment in the last 10 months. We will discuss, with examples, what we think are the great points of the technology for our use cases and what we have learned from production deployment. What has worked.. and what has not worked!
If you read this and are going to participate to the UKOUG conference, you are welcome to come to our presentation on December 4th at 15:10 in Hall 5!

Slides can be downloaded from this link.

Wednesday, September 26, 2012

Command-Line DBA Scripts

Topic: A review of the command-line monitoring scripts that I currently use for Oracle.

Note: last updated in June 2015. Download links: http://canali.web.cern.ch/canali/resources.htm and https://github.com/lucacanali

Command-line is still very useful for Oracle database administration, troubleshooting and performance monitoring. One of the first thins that I do when I want to work with a given Oracle database is to connect as a DBA user with Sql*plus and get information on the active sessions with a script. I will run the script a few times to get an idea of what's happening 'right now' and get a possible starting point for more systematic tuning/troubleshooting if needed.
I liken this approach to examining a few pictures from a remote webcam and trying to find out what's happening out there!.. The method can be powerful, although with some limitations. One of its advantages is that it is easy to implement in Oracle! One of the main source of information for this type of monitoring is GV$SESSION, which is available in Oracle since ages. More recently GV$ACTIVE_SESSION_HISTORY has proven very useful and I quite like the additional improvements introduced there in11gR2.

Examples: One of my favorites is a script I have called top.sql (not very much an original name I know :) which queries gv$session (RAC-aware) and prints out what I currently find the most interesting details (for my environment) on active sessions that I can fit in roughly 200 characters. Another script that does a similar job taking data from v$active_session_history is ash.sql. the scripts ash2.sql, ash3.sql and ash4.sql are the RAC versions for 2, 3 and 4 node RACs respectively.


One of the typical pain points when writing CLI monitoring scripts is deciding which information to display and how to pack it in a relatively short line. I have also found that with each major Oracle versions I want to refresh my scripts with a few changes.

Scripts details: I have packaged some of the scripts that I have written and that I find useful in a zip file which is available at this link (unzip locally and run using sqlplus).  Note that the scripts are tested on 11gR2, many of them will also work on 10g and 12c.

Monitoring Class (Active Session Monitoring)
Script Name Description
top.sqlList with selected details of the active sessions, from gv$session (RAC aware)
top_10g.sqlActive sessions details, from gv$session, 10g version  (RAC aware)
ash.sqlList with selected details of the active sessions, from v$active_session_history (one instance only)
ash_h.sqlUsage @ash_h <n#_sec>. Active sessions details for the last n#_sec seconds, from v$active_session_history (current instance only)
ash2.sql, ash3.sql ash4.sqlActive sessions details, from gv$active_session_history (for RAC 2, 3 and 4 nodes).

SQL Monitoring
Script NameDescription
monitor.sql Display information form gv$sql_monitor for running statements statements. See also monitor_details.sql
monitor_plan.sqlUsage: @monitor_plan <key>. A drill-down script from monitor.sql. Prints details of the execution plan from gv$sql_plan_monitor
monitor_report_sid.sql,
monitor_report_sid_active.sql
Usage: @monitor_report_sid_active <sid>. Generates an active/html sql monior report for the given sid. See also @monitor_report_sid <sid> for the simpler html version.
monitor_report_sqli_d.sql,
monitor_report_sql_id_active.sql
Usage: @monitor_report_sql_id_active <sql_id>. Generates an active/html sql monior report for the given sql_id. See also @monitor_report_sql_id <sql)id> for the simpler html version.

Drill-down Class
Script Name Description
ash_sess.sqlUsage: @ash_sess <inst_id> <sid> <n#_sec>. Prints details from gv$active_session_history for a given session (RAC aware)
explain.sqlUsage: @explain <sql_id>. Prints detailed explain plan info for given sql_id from library cache (current instance only)
explain_awr.sqlUsage: @explain_awr sql_id. Prints detailed explain plan info for given sql_id from AWR (rac aware)
obj.sqlUsage: @obj <object_id>. Find details for a given object id, or data_object_id
sessinfo.sqlUsage example: @sessinfo sid=<num>. Prints info on sessions that match the search pattern (RAC aware)
sql.sqlUsage: @sql <sql_id>. Prints sql text for given sql_id (RAC aware)

Workload Characterization Class
Script Name Description
sysmetric.sql, sysmetric_details.sqlDisplay values of selected system metrics (RAC aware)
iometric.sql, iometric_details.sqlIO utilization metrics and details (RAC aware)
ehm.sql, ehm_local.sql
ehm_micro, ehm_micro_local
Usage: @ehm 10 "db file sequential read". Event history metric data, helps finding skew in wait times, for example to study random reads latency skew (RAC aware).
ehm_micro -> 12c version, latency drill down at microsecond.
eventmetric.sqlWait event metric details for top wait events (RAC aware)
filemetric.sqlFile metric details for top-usage files (RAC aware)
ash_top.sqlUsage @ash_top <n#_sec>. Aggregates ash data over n#_sec seconds (current instance only)
jobs_running.sqlList running jobs details (RAC aware)
load.sqlDisplays OS-load values (RAC aware)
longops.sql, longops_10g.sql, longops_details.sqlInfo on long running operations from gv$sessoin_longops (RAC aware)
transactions.sqlList of open transactions and crash recovery (RAC aware)

ASH and AWR Reports
Script NameDescription
ash_report.sqlUsage example: @ash_report sysdate-1/24 sysdate Generates an ASH report for the current instance and displays it in a browser.
awr_report.sqlUsage: @awr_report <snap_begin> <snap_end>. Generates an AWR report for the current instance and displays it in a browser.
awr_global_report.sqlUsage: @awr_global_report <snap_begin> <snap_end>. Generates a global AWR report for the current database and displays it in a browser.
awr_snapshots.sqlLists AWR snapshots for the last 24 hours.

Miscellaneous
Script Name Description
kill.sqlUsage @kill <sid> <serial#> kill the given session (current instance only)
kill_select.sqlUsage example: @kill_select "sql_id='xxxx'". Helps in preparing scripts for killing large number of sessions (current instance only)
locks.sql, locks_10g.sql, locks_details.sqlScripts to help investigating locks and blocking enqueues issues (RACaware)
binds.sqlUsage: @binds <sql_id>. Helps in finding bind values for a given sql_id (RAC aware)
login.sqlSql*plus startup/config script for my environment (windows client version 11.2.0.3)


Conclusions
Command-line scripts are still of great help for monitoring and  performance troubleshooting of Oracle databases. Of particular interest are queries to print the details of active sessions, the SQL they execute and their wait events details. This makes for a quick and easy entry point to troubleshooting and to get an idea of what's happening on the DB server. More systematic troubleshooting may be needed beyond that. Command-line scripts are easy to modify and customize, many DBAs indeed have their own tool-set. A selection of my own DBA scripts can be downloaded from this link.

A few links to end this post: First of all the excellent session snapper and the rest of the collection of scripts by Tanel Poder. In addition a mention to a few cool tools that are a sort of bridge between command-line and graphical-interface approaches: MOATS by Tanel Poder and Adrian Billington , SQL Dashboard by Jagjeet Singh and topaas by Marcin Przepiorowski.