Wednesday, May 8, 2013

Oracle Events' Latency Visualization and Heat Maps in SQL*plus

Topic: This post is about a technique for Oracle performance tuning, the use of heat maps to investigate wait event latency (and in particular I/O-related latency). This post also discusses a SQL*plus-based script/tool I have developed to help with this type of monitoring and performance drill-down (OraLatencyMap).

Context: Oracle exposes latency data for the wait event interface in  V$EVENT_HISTOGRAM. This gives an additional dimension to drill down performance data for analysis and troubleshooting. In an older blog post I described an example of troubleshooting a storage issue for an OLTP (RAC) database by investigating the histogram data of the 'db file sequential read' wait event. In that context I had also developed and discussed ehm.sql, a simple PL/SQL script to collect and display data from GV$EVENT_HISTOGRAM.

What's new: An excellent article by Brendan Gregg, "Visualizing System Latency", Communications of the ACM, July 2010 has inspired me to develop an enhanced version of ehm.sql. The idea is to display in real time data of current and historical values of selected wait event histograms. In particular I/O-related events such as db file sequential read and log file sync make excellent candidates for this type of analysis. Moreover those events are relevant in a context that is familiar to me, that is drilling down issues with OLTP performance and access to storage.
As Brendan shows in his article, I/O latency data fits naturally to heat map representation, where time is on the horizontal axis, latency buckets are on the vertical axis and the quantity to display (for example number of waits or time waited) is displayed as color (hence the name heat map).

The tool: OraLatencyMap is a tool I have developed to help extract and represent event histogram data in a heat map. It is intended to be lightweight and 'DBA-friendly'. It's implemented in PL/SQL, it does not require the creation of any DB objects and runs under SQL*plus. OraLatencyMap requires a terminal supporting ANSI escape codes (for example PuTTY, MobaXterm, xterm, notably it does not run under windows' cmd.exe). Making SQL*plus behave like a monitoring tool, requires jumping through hoops. Credits to the experts who have shared their results in this area and thus made my job much easier here. In particular: Tanel Poder (for moats, sqlplus and color, snapper, etc..), Adrian Billington (moats) and Marcin Przepiorowski (topaas).

Example 1: study of db file sequential read
See here below a screen shot of a putty terminal where I ran SQL*plus and @OraLatencyMap. The script samples GV$EVENT_HISTOGRAM roughly every 3 seconds and displays 2 heat maps. The top heat map gives information on the number of waits per second on each latency bucket. The bottom heat map instead represents the estimated wait time per latency bucket. The two graphs represent the same type of information but with 2 different 'points of view'.

SQL> @OraLatencyMap


This type of data is useful when investigating single block read latency issues in OLTP systems for example. I'll leave for another time a discussion of the details and limitations of this approach.  I'll just point out that among others, it's also important to make sure the system is not starving with CPU to make sense of the data (CPU data not shown here). When reading the heat map I typically focus on 3 areas: one is low-latency (1ms) where I get info on what is most likely reads from storage cache, the second is the area of latency around the 16 and 32 ms buckets, most likely representing physical reads from 'rotating disks'. The third very interesting area to watch is the 'high latency' (>100ms), that is the area of "IO latency outliers" and can be sign of problems with the storage for example. Note that OraLatencyMap is a drill-down tool based on Oracle instrumentation so the interpretation of the results, especially when extended to storage, will often need additional data from the specific context being investigated. 

Example 2: study of log file sync
This example is about visualizing the latency of log file sync. This can be useful when drilling down commit latency issues. This is a critical area for many OLTP systems and the troubleshooting is often not easy: storage performance, server CPU starvation and unexpected behavior by LGWR, among others can all potentially cause problems in this area.
This screen shot was taken from a putty window, this time with a white background.

SQL> @OraLatencyMap_event 3 "log file sync"



Conclusions
Latency investigations of Oracle wait events give an additional and powerful dimension to performance tuning and troubleshooting. This is particularly useful for (synchronous) I/O-related wait events such db file sequential read and log file sync. Latency heat maps are a particularly suited to visualize IO latency (see also Brendan Gregg's article). The first version of a simple SQL*plus script (OraLatencyMap) to collect and visualize event histogram data as heat maps has been discussed.
OraLatencyMap is available for download from http://canali.web.cern.ch/canali/resources.htm

Monday, March 25, 2013

Testing Lost Writes with Oracle and Data Guard

Topic: This post is about lost writes in Oracle and in particular on techniques to reproduce and investigate the effects of lost writes in a test environment and with Data Guard

Motivations: Imagine this scenario: a production system has two standbys to protect against disaster and to load balance read-only load (with Active Data Guard (ADG)). A lost write happens in the primary and remains unnoticed for a few days. Finally the block that suffered a lost write is updated again. Both standbys stop applying redo throwing ORA-600 [3020] (also known as stuck recovery). The primary DB keeps working fine, although it is logically corrupted by the lost write.
You are the DBA in charge of fixing this, what would you do?

The importance of testing: I hope that the example above illustrates that lost write can generate quite complex recovery scenarios and overall a few headaches to support DBAs. In this post I illustrate a few techniques and examples that can be used to test the effects of lost writes in Oracle and therefore prepare in case of a real-world issue strikes. Of particular interest will be to test the effect of lost writes in an environment with (Active) Data Guard.
We need to cover some ground first on techniques and how to setup the test. But first a definition.

Lost writes: "A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage" (from support note 1302539.1). Lost writes can be caused by faulty storage, but also anything in between our data in RAM (in the buffer cache, for example) and disk storage can potentially cause similar effects, this list includes also potential Oracle bugs.

Digression on techniques.
1. A useful technique that we will need in the following is the ability to read and write a single block from Oracle data files (in a test environment). For databases on filesystems (and also DBs on NFS) dd is the tool for this job (I am using the Linux platform as reference). Examples:

read one 8KB block from filesystem (block 134 in this example):
dd if=testlostwrite.dbf bs=8192 count=1 skip=134 of=blk134.dmp conv=notrunc

write one 8KB block to filesystem (block 134 in this example)::
dd of=testlostwrite.dbf bs=8192 count=1 seek=134 if=blk134.dmp conv=notrunc

Note when writing we must use conv=notrunc or else we will end up with an unusable output file. Note also the syntax for specifying the block offset, skip is used for input files (reads) and seek for output files (writes).

How to read and write single blocks on ASM data files. One possibility is to take a backup copy of the datafile with RMAN, edit it with dd (as detailed above), then (with rman again) restore the backup copy. With a little knowledge of ASM internals, more direct ways to access files in ASM are available: one can find the position of the block (and its mirror copies if relevant) and then use dd to read/write data directly. Currently my preferred way is sightly different and it exploits the dbms_diskgroup package. This is an undocumented package (see again the link above on ASM internals for some additional details) although it is extensively used by Oracle's asmcmd utility. I have packaged the dbms_diskgroup.read and dbms_diskgroup.write calls into a small utility written in perl (that I called asmblk_edit, follow this link to download asmblk_edit). Similar ideas can be found also in support note 603962.1.
The following example illustrates using the utility asmblk_edit to read and write block number 134 from and to a data files stored in ASM:

read one 8KB block from an Oracle data file in ASM (block 134 in this example):
./asmblk_edit -r -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp

write one 8KB block to and Oracle data file in ASM (block 134 in this example):
./asmblk_edit -w -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp

2. Another technique that we need is quite straightforward and allows us to find the offset of the Oracle block that we want to read/write to for our tests.
The example here below shows how to find block number where data is stored, using rowid:
SQL> select rowid, dbms_rowid.ROWID_BLOCK_NUMBER(rowid), a.* from testlosttable a;

Incidentally finding the block number for an index leaf block can be done with the following (using the undocumented function sys_op_lbid):
SQL> select rowid rowid_table_from_index_leaf, sys_op_lbid(18297477,'L',t.rowid) index_leaf_rowid from testlosttable t --note: 18297477 in this example is the data_object_id of the index I am examining, edit with the actual number as needed

3. Last but not least, we need a way to modify data blocks 'under the nose of Oracle', in particular we want to make sure we flush/invalidate the relevant cached data and metadata. The method we will use in the following is:
  • Offline the tablespace where data resides (this flushes dirty blocks and invalidates cache entries)
  • Perform the read/write modifications to the block, with dd or asmblk_edit, as needed
  • Online the tablespace again before next usage

A basic recipe to reproduce the effects of a lost write. 
We can now put together the ideas and techniques described above into a working example aimed at reproducing the effects of a lost write in Oracle:

SQL> create bigfile tablespace testlostwrite datafile '..{path to datafile}../testlostwrite.dbf' size 10m;
SQL> create table testlosttable (id number, payload varchar2(100)) tablespace testlostwrite ;
SQL> create index i_testlosttable on testlosttable (id) tablespace testlostwrite ;
SQL> insert into testlosttable values (10,'aaaaaaaaaaaaaaaaaaaaaaaaa');
SQL> insert into testlosttable values (20,'bbbbbbbbbbbbbbbbbbbbbbbbb');
SQL> commit;
SQL> select rowid, dbms_rowid.ROWID_BLOCK_NUMBER(rowid), a.* from testlosttable a;
--note: this will allow to find the block_id where data resides, let's say it's block 134

SQL> alter tablespace testlostwrite offline;
-- read block either with dd or with asmblk_edit and create backup copy. Example:
-- ./asmblk_edit -r -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp
SQL> alter tablespace testlostwrite online;
SQL> insert into testlosttable values (30,'cccccccccccccccccccccccccc');
SQL> commit;
SQL> alter tablespace testlostwrite offline;
-- write block either with dd or with asmblk_edit and from previously created backup copy. Example:
-- ./asmblk_edit -w -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp
SQL> alter tablespace testlostwrite online;
SQL> -- our database has now a lost write in the table testlosttable block 134

The effect of the lost write on the table is that the row with id=30 has disappeared from the table. However the entry with id=30 is still visible in the index. This observation can be confirmed with  the 2 queries reported here. Note that in case of normal operations (i.e. no lost writes) the 2 queries should both return three rows, this is not the case here because of our manual editing of the table block.

SQL> select /*+ INDEX_FFS(a)*/ id from testlosttable a where id is not null;
SQL> select  /*+ FULL(a)*/ id from testlosttable a;
SQL> -- these 2 queries return different results in our test with a lost write.. and only one of them is correct!

We can now proceed with a very interesting test: insert another row of data into the table, for example:

SQL> insert into testlosttable values (40,'ddddddddddddddddddddddd');
SQL> commit;

What we should notice notice at this point is that Oracle keeps working fine and no errors are shown to the user. With the techniques discussed above we can easily show that this new row has  been inserted into block 134 (the block with a lost write). Let's postpone further investigations for a later paragraph and for now just note that Oracle cannot easily find out that we have suffered a lost write.

An example of the effects of lost writes with Standby (Data Guard). 
A standby database provides a copy of the database that Oracle can use to detect lost writes (i.e. the standby provides a reference of 'good data; in first approximation). Let's see how we can test how this works in practice:
  • We start by going through the same steps described above and we create a lost write in the primary. 
    • Note that at this point the standby has no knowledge that something has gone wrong in the primary.
  • Now we can go ahead and run DML against the block that has suffered the lost write 
    • i.e. we insert the row with id=40 as detailed above.
  • At the moment when the standby database will try to apply the 'redo' (change vector) to the block with a lost write, it will compare SCN numbers and find that 'something is wrong'. MRP will stop and throw ORA-600 [3020] (stuck recovery).
Example of the error stack for a stuck recovery. Note that ORA-600 [3020] can be caused by a lost write (as in this example) but also by other issues, in particular with the redo consistency (outside the scope of this discussion):

ORA-00600: internal error code, arguments: [3020], [10], [134], [134], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 10, block# 134, file offset is 1097728 bytes)
ORA-10564: tablespace TESTLOSTWRITE
ORA-01110: data file 10: '+STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 18299147
Note: additional information is dumped in the trace files of MRP and of the recovery slaves

Oracle 11g and lost write protection.
In our previous example to reproduce lost write effects, Oracle will only throw an error (ORA-600 [3020], aka stuck recovery) when a DML operation is performed on the primary DB against a block that had suffered a lost write. This means that in practice lost writes may remain silently unnoticed in the primary for an unlimited period of time. A new feature of 11g can be used to make Oracle look for lost writes also in blocks used for queries (in the primary). This is what we can do to use it:
  • we set DB_LOST_WRITE_PROTECT = TYPICAL (or FULL if we prefer) on the primary database. This will cause the generation of additional redo entries when Oracle performs physical reads. 
  • we set DB_LOST_WRITE_PROTECT = TYPICAL on the standby, this will make MRP and its recovery slaves to check for lost writes using the extra information in the redo log stream. 
  • If we hit a lost write MRP and its slaves will stop and throw ORA-752: recovery detected a lost write of a data block 
An example of the error stack for lost writes with db_lost_write_protect set to typical is pasted here below. In comparison with ORA-600 [3020] seen above, in this case we have more relevant information, also Oracle has checked that this is a case of lost write and has even tried to resolve the 'logical corruption' trying to read the second mirror copy from the ASM diskgroup (which could not help in this case).

Hex dump of (file 10, block 134) in trace file {path..}/standby_pr09_26471.trc
Reading datafile '+STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685' for corruption at rdba: 0x00000086 (file 10, block 134)
Read datafile mirror 'STDBY_DATADG1_0000' (file 10, block 134) found same corrupt data (logically corrupt)
Read datafile mirror 'STDBY_DATADG1_0011' (file 10, block 134) found same corrupt data (logically corrupt)
STANDBY REDO APPLICATION HAS DETECTED THAT THE PRIMARY DATABASE
LOST A DISK WRITE OF BLOCK 134, FILE 10
NO REDO AT OR AFTER SCN 6367748422450 CAN BE USED FOR RECOVERY.
ORA-00752: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 10, block# 134, file offset is 1097728 bytes)
ORA-10564: tablespace TESTLOSTWRITE
ORA-01110: data file 10: '+STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 18299538
Note: additional information is dumped in the trace files of MRP and of the recovery slaves

Digression on DB_LOST_WRITE_PROTECT. 
A comment: from the discussion above we see that the extra checks that oracle recovery process does when setting DB_LOST_WRITE_PROTECT are definitely an improvement from 10g behavior, although this mechanism does not provide a complete protection against lost writes but only gives us a higher probability that a lost write will be found.
How about the impact in production of this parameter? Extra redo entries are generated on the primary: they are called block read redo (BRR). We can directly investigate BRR entries with 2 simple techniques: dumping redo (BRR corresponds to redo at layer 23 and opcode 2) and querying v$mystat or v$sysstat (we will look for stats with 'lost write' in their name).
Example SQL:

SQL> alter system dump logfile '...{path to storage}.../thread_2_seq_1816.510.810661967' layer 23 opcode 2;
SQL> select name, sum(value) from v$mystat se, v$statname n where n.statistic#=se.statistic# and (n.name like '%lost write%' or name like '%physical read%') group by name;

The size of BRR entries in the redo stream varies, as redo dumps show that and a few optimizations can come into play (such as batching more entries in one redo record). Based on a few observations of a production system I estimate than on average we can expect 30 bytes of extra redo generated by BRR for each physical block read performed by the database, although mileage may vary and definitely the impact of the parameter should be tested before applying it to a busy production! Another observation based on testing is that direct read operations (for example reads for parallel query) do not generate BRR.

On a standby instance Oracle sets _db_lost_write_checking (enable scn-based lost write detection mechanism) to the value 2 (based on observation of my test system 11.2.0.3). MRP slaves do the extra work of checking for lost writes by comparing SCNs in BRRs with SCN in the block headers. Notably the extra work implies additional physical reads on the standby. This can be seen in v$sysstat (statistics name = 'recovery blocks read for lost write detection' and also the related stat 'recovery blocks skipped lost write checks').
Example of BRR entry from a logfile dump:

CHANGE #5 TYP:0 CLS:4 AFN:10 DBA:0x00000086 OBJ:18299943 SCN:0x05ca.9b6e95f9 SEQ:1 OP:23.2 ENC:0 RBL:1
 Block Read - afn: 10 rdba: 0x00000086 BFT:(1024,134) non-BFT:(0,134)
              scn: 0x05ca.9b6e95f9 seq: 0x01
              flags: 0x00000004 ( ckval )

After setting db_lost_write_protect to typical (or full) and rebooting the instance Oracle will set _log_committime_block_cleanout to true (based on observations on my test system 11.2.03). The effect is that (on the primary instances) Oracle will write into the redo logs additional entries related to block cleanout operations (layer 4 opcode 6). I would comment that logging block cleanout redo record (and subsequently applying them to the standby database) is a desirable feature especially in the case of Active Data Guard.

Note: checks for lost write based on SCN from BRR entries are performed also when doing media recovery (for example recover database command). Therefore even if Data Guard is not available one can use a simple restore of the database from backup to perform validation of BRR records to search for lost writes. 

Analysis and troubleshooting.
A good starting point for troubleshooting lost writes in a production system is support note 1265884.1 "Resolving ORA-752 or ORA-600 [3020] During Standby Recovery".
However now we have a test environment with a lost write and we can have some fun investigating it and developing techniques for analysis and troubleshooting. All this with the great advantage, compared to a real-life case, that now we know for sure what the root cause of the problem is!

- Investigate the block contents on primary and standby by dumping the block contents to a trace file. The idea is to compare the contents of the primary and standby. Of particular interest will be  the SCN of the last change to the block and also SCNs in the ITL list. This is an easy operation to perform. Example:

SQL> alter system dump datafile 10 block 134;
SQL> alter session set events 'immediate trace name set_tsn_p1 level <ts#+1>'; -- where ts# is the tablespace number
SQL> alter session set events 'immediate trace name buffer level <decimal rdba>'; --rba is 134 in our example

- We can also investigate the contents of the block with a lost write using SQL. This has the advantage of allowing the use of flashback query. This is particularly useful for reading the details of the block on the primary database. The current content of the block on the primary may not be what we want. We are interested in a consistent image at point in time with SCN equal to the SCN where we have our error message on the standby (current SCN of the standby).  Example:

SQL> set num 16
SQL> select ora_rowscn, rowid, dbms_rowid.rowid_row_number(a.rowid) row_number, a.* from testlosttable as of scn 6367748219413 a where rowid like 'ABFzkJAAAAAAACG%'; -- edit values for scn and rowid, use SQL below to find the values to use

Find rowid of the block with lost write (block 134 of file 10 and object_id=18299145). Actually what we need is just the first 15 characters of the rowid (the last three characters are the row_number inside the block). Example:

SQL> select substr(DBMS_ROWID.ROWID_CREATE(rowid_type =>1, object_number =>18299145, relative_fno =>0, block_number =>134, row_number =>0),1,15) rowid_prefix from dual;

Find current SCN (at the standby): SQL> select current_scn from v$database;

- Dump from all relevant logfiles searching for entries related to the block with lost writes (in our example it's block 134 of file 10). The dump will include transaction details and most notably redo marking the time when DBWR has written the give block (this info is sotre in block written redo, BWR). If  the parameter db_lost_write_protect is set to typical or full the redo dump will also show details of the block read redo (see BRR discussed above). For further info on logfile dumps see also Julian Dyke's website. Example:

SQL> alter system dump logfile '...{path to storage}.../thread_2_seq_1816.510.810661967' DBA MIN 10 134 DBA MAX 10 134; -- edit file number and block number as needed

- Perform log mining to find the SQL of all the transactions for the affected block. Identify the relevant redo logs to mine first. Example:

SQL> BEGIN
SYS.DBMS_LOGMNR.ADD_LOGFILE(LogFileName=>'...{path to storage}.../thread_2_seq_1816.510.810661967',Options=>SYS.DBMS_LOGMNR.NEW);
SYS.DBMS_LOGMNR.START_LOGMNR(Options=> SYS.DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);
END;
/
SQL> SELECT scn,sql_redo FROM SYS.V_$LOGMNR_CONTENTS WHERE data_obj#=18299145 and row_id like 'ABFzkJAAAAAAACG%'; -- calculate rowid with dbms_rowid package as detailed above

- Other analysis that can be done in case we have indexes on the tables:
In case one or more indexes are present we can read data from the index and compare the results with what we have on the table. Example to read from the index:

SQL> select rowid rowid_table_from_index_leaf, id, sys_op_lbid(18299146,'L',t.rowid) index_leaf_rowid from testlosttable t where rowid like 'ABFzkJAAAAAAACG%';
--note: update 18299146 with the the data_object_id of the index of interest 
-- in this example this is the data_object_id of I_TESTLOSTTABLE 

Another standard check with corruption between index and table is to run analyze table validate, although in my experience this can be quite time consuming and not necessarily add more information to the analysis. Example:

SQL> analyze table testlosttable validate structure cascade online;

Actions that we can take to restore the services
Suppose that our analysis has confirmed that a lost write happened and also that we have the details of what 'is lost'. We need now to fix the corrupted block on the primary and restore the service on the standby. In particular if we have an Active Data Guard with a SLA, the latter may be quite an urgent action. Hopefully we also have some ideas of what the root cause was and a way to fix it in the future.

One possible action is to failover to standby. This action plan however may provide to be unacceptable in many circumstances, given the potential for data loss it implies. A failover would likely not be acceptable if the primary database has continued working and accepting users transactions since the time (SCN) of the incident that has generated ORA-600 (or ORA-752).

Another possibility is to use our knowledge of the lost transactions gathered in the analysis phase to run SQL actions to 'fix the primary'. This has to be evaluated case by case. In some circumstances we can also get away with just dropping and recreating the object with a lost write. In our simple example of a lost insert on table testlosttable, the action to perform on the primary is:

SQL> alter index i_testlosttable rebuild online;
SQL> insert into testlosttable values (30,'cccccccccccccccccccccccccc');
SQL> commit;

What about fixing the standby? We can unstuck the recovery on the standby by allowing it to corrupt  the block with a lost write (and fix it later, as detailed below). Example:

SQL> alter system set db_lost_write_protect=none; --temporarily disable lost write check if needed
SQL> alter database recover automatic standby database allow 1 corruption;
SQL> --wait till the redo that cause the error message has been applied
SQL> alter database recover cancel;
SQL> -- restart normal Data Guard operations. An example for ADG:
SQL> alter system set db_lost_write_protect=typical;
SQL> alter database open read only;
SQL> alter database recover managed standby database nodelay using current logfile disconnect;

At this point all is OK except for one corrupted block on the standby. How to restore the corrupted block on standby? This depends on the case, we may just rebuild the object on primary this will fix the problem on standby. Another option is to copy over the datafile from primary to standby
In our example the corrupted block is block 134 of file 10 and we have an active data guard in real time apply. We can use automatic block media recovery (ABMR) to fix it. In my tests AMBR is attempted but does not really work against the corrupted block 134, I can work around this by zeroing out the block. This is an example (intended to be used on test databases):

./asmblk_edit -w -s 134 -a +STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685 -f zeroblock
where zeroblock file is created with: dd if=/dev/zero bs=8192 count=1 of=zeroblock 
if my standby was on filesystem I could have used:
dd if=/dev/zero of=testlostwrite.dbf bs=8192 count=1 seek=134 if=blk134.dmp conv=notrunc

If we now query the table (the table testlostwrite in our example) when Oracle reaches the zeroed block it will fetch a copy from production (see the manual for the details of configuring ABMR). This will happen in a transparent way for the user issuing the query, the operation is logged in the alert log of the instance (the 2 lines here below appeared repeated twice in my test using 11.2.0.3):
Automatic block media recovery requested for (file# 10, block# 134)
Automatic block media recovery successful for (file# 10, block# 134)
Oracle's OTN demo video on lost write
Short videos with demos on testing lost writes and automatic block media recovery in a Data Guard environment can be found on OTN.
See also support document "Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration" [ID 1302539.1]

Conclusions and future work
This article illustrates a simple example on how to reproduce and investigate the effects lost write in a test environment, which is intended as a training exercise for DBAs.
I would like to end with four points that I still find unsettling when thinking at how lost writes can affect an Oracle setup configured for high availability:
  • Finding the root causes of a lost write can prove to be very time consuming: was it caused by the storage, or by an Oracle bug, or somewhere in between? If the issue cannot be re-produced you can easily find yourself in between a finger-pointing discussion between vendors.
  • Having a standby is of great help for discovering lost writes. However once a lost write is found the redo apply stops: the recovery is stuck and throws error ORA-600 [3020] or ORA-752 if we try to restart it. A support DBA most likely will have to analyze the situation and decide what to do to fix the issue (for example perform a failover or rather going through a process similar to what described in this article). Moreover if one or more Active Data Guards are used for critical read-only activity, there is time pressure to restart the redo apply.
  • How can we check that our primary and standby databases are 'in sync', that is that there are no lost writes waiting to 'explode' as time bombs? I am not aware of a utility that could do such check. This is a very interesting topic, probably material for another post. A brief discussion of this problem and possible solutions can be found at this link and also here.
  • Database restores can be affected by lost writes too. For example the restore of a database backup can fail (get stuck on applying redo) because of a lost write that has happened in production after the latest data file backup. This has potential impacts on the disaster and recovery strategy.

Note: the examples reported in this article have been tested against Oracle 11.2.0.3 64 bit for Linux (RHEL5).

Thursday, February 14, 2013

AWR Analytics and Oracle Performance Visualization with PerfSheet4

Topic: This post describes PerfSheet4, a tool for performance analysis aimed at streamlining access and visualization of Oracle's AWR data. The tool is aimed at DBAs and Oracle performance analysts. PerfSheet4 is a spin off of previous original work by Tanel Poder, rewritten and integrated with additional functionality for AWR analysis and with important changes to the user interface.

Context: There is much information in the counters and metrics of Oracle's AWR that can be of substantial help for troubleshooting and for capacity planning. Besides the standard AWR report, time-based analysis is often very useful. However this type of access is made slow and tedious by a list of tasks, such as, fetching data, calculating incremental changes and producing interesting graphs.This normally requires developing or downloading a bunch of scripts and adding some visualization techniques on top. See for example Karl Arao's scripts and blog.
Tanel Poder has already demonstrated a few years ago that the pivot chart functionality of Excel can be an excellent platform for performance data visualization including AWR (and Perfstat) data. See his work on  Perfsheet 2.0 and PerfSheet3.

What's new: The main drive to develop PerfSheet4 was to (try to) make the user interface as simple as possible, without compromising the general nature of the tool. A set of pre-defined queries for some common AWR analysis come with the tool. The plotting functionality will generate custom graphs for such data, an easy entry point to analysis.
If direct access to the DB is not possible, data can be extracted from the DB with the scripts supplied in the sql*plus_scripts directory and then loaded as csv. Similarly, export to csv can be used to archive data for later analysis.

PerfSheet4 is a free tool and is available for download from this link. The tool has been developed and tested on Excel 2010 against Oracle 11gR2. See here below a snapshot of the user interface with some additional details.




Why Excel: Pivot tables and charts are a great feature available in Excel that allows to quickly and easily analyze multidimensional data. This can be of great help when troubleshooting problems in production, for example with the need to try out different ideas and data combinations

Why pre-defined graphs: This feature is again to make life easier for the user. The idea is to automates the tedious task of setting up the pivot chart fields and provides an entry point to get to the needed graph.



Available queries: version 3.2p beta comes with the following queries (and pre-defined graphs):

Query name name Short description
Wait events Extracts data from dba_hist_system_event, computes delta values between snapshots and rates (i.e. delta values divided over delta time)
System statistics Extracts data from dba_hist_sysstat, computes delta values between snapshots and rates (i.e. delta values divided over delta time)
Stats per service Extracts data from dba_hist_service_stat, computes delta values between snapshots and rates (i.e. delta values divided over delta time)
Workload data Extracts data from dba_hist_sysmetric_summary, metrics give details on the usage of system resources over time ranges
IO wait latency Extracts data from dba_hist_event_histogram for io related events, computes delta values between snapshots and rates (i.e. delta values divided over delta time)
Top 5 wait per instance Extracts data from dba_hist_system_event and dba_hist_sysstat, computes delta values and rates (delta value over delta time) and selects top 5 non idle events for each instance
Wait events per class Extracts data from dba_hist_system_event, computes delta values and rates (delta value over delta time) aggregating over wait class


Future work: Stabilize Perfsheet4 against bugs and for different execution environments. For this feedback from other users would be very much appreciated. Possibly add more pre-defined queries (for AWR analytics but not necessarily only that). Add a post with examples of how analysis performed with PerfSheet4 have been useful for my job in various occasions. In the mean time if you find the tool useful or find bugs or have ideas for possible improvements, I'll be interested to know!

Acknowledgments: Many thanks to Tanel Poder, the co-author of the tool.

Wednesday, November 28, 2012

Active Data Guard and UKOUG 2012

I am looking forward to participating again to the UKOUG annual conference (and also to attend the Sunday's OakTable event). This is for me a great opportunity to meet and discuss with many passionate Oracle experts who regularly attend the conference and also to get up-to-date with the latest technology and news from the Oracle community.
My colleague Marcin and I have just finished preparing our presentation on the subject of Active Data Guard. I have enjoyed the work even though, as it is often the case, this has taken quite some effort from both of us to reach the level of details we wanted.
Our goal with this presentation is to share our thoughts and experience on Active Data Guard as we have seen it mature in our production environment in the last 10 months. We will discuss, with examples, what we think are the great points of the technology for our use cases and what we have learned from production deployment. What has worked.. and what has not worked!
If you read this and are going to participate to the UKOUG conference, you are welcome to come to our presentation on December 4th at 15:10 in Hall 5!

Slides can be downloaded from this link.

Wednesday, September 26, 2012

On Command-Line DBA

Topic: A review of the command-line monitoring scripts that I currently use for Oracle.

Command-line is still very useful for Oracle database administration, troubleshooting and performance monitoring. One of the first thins that I do when I want to work with a given Oracle database is to connect as a privileged user with Sql*plus and get information on the active sessions with a script. I will run the script a few times to get an idea of what's happening 'right now' and get a possible starting point for more systematic tuning/troubleshooting if needed.
I liken this approach to examining a few pictures from a remote webcam and trying to find out what's happening out there!.. The method can be powerful, although with some limitations. One of its advantages is that it is easy to implement in Oracle! One of the main source of information for this type of monitoring is GV$SESSION, which is available in Oracle since ages. More recently GV$ACTIVE_SESSION_HISTORY has proven very useful and I quite like the additional improvements introduced there in11gR2.

Examples: One of my favorites is a script I have called top.sql (not very much an original name I know :) which queries gv$session (RAC-aware) and prints out what I currently find the most interesting details (for my environment) on active sessions that I can fit in roughly 200 characters. Another script that does a similar job taking data from v$active_session_history is ash.sql together with 'its brothers' ash2.sql and ash4.sql which are meant for RAC of 2 and 4 nodes respectively.


One of the points that I have found as most time-consuming when writing such monitoring scripts has to do with deciding which information I wanted to display and how to pack it in a relatively short line. I have also found that with each major Oracle versions I want to refresh my scripts with a few changes. I have recently done that when moving from 10g to 11gR2.

Notable info displayed by top.sql: SQL_DT=calculated as round((sysdate-sql_exec_start)*24*3600,1); CALL_DT=last_call_et; W_dT=seconds in wait (if waiting) or else seconds since last wait; OBJ#=object number currently under operation, TR=has the session an open transanction?

Notable info displayed by ash.sql: EXEC_PLAN_LN#_OBJ#=which step of the execution plan is the session executing? at which line and on which object?; "DB%,CPU" percentage active in terms of db time and cpu time; "R,W_IOP"=read and write IOPS; "R,W_MBP"= read and write throughput in MB/s;  "PGA,TEMP_"=pga and temporary space usage.

Scripts details: I have packaged some of the scripts that I have written and that I find useful in a zip file which is available at this link (see there for the most recent version of the Monitoring Scripts file).  Note that the scripts are tested on 11gR2, some of them will also work on 10g.

Monitoring Class
Script Name Description
top.sqlList with selected details of the active sessions, from gv$session (RAC aware)
top_10g.sqlActive sessions details, from gv$session, 10g version  (RAC aware)
ash.sqlList with selected details of the active sessions, from v$active_session_history (one instance only)
ash_h.sqlUsage @ash_h n#_sec. Active sessions details for the last n#_sec seconds, from v$active_session_history (current instance only)
ash2.sql, ash4.sqlActive sessions details, from gv$active_session_history (for RAC 2 and 4 nodes)

Drill-down Class
Script Name Description
ash_sess.sqlUsage: @ash_sess inst_id sid n#_sec. Prints details from gv$active_session_history for a given session (RAC aware)
explain.sqlUsage: @explain sql_id. Prints detailed explain plan info for given sql_id from libary cache (current instance only)
explain_awr.sqlUsage: @explain_awr sql_id. Prints detailed explain plan info for given sql_id from AWR (rac aware)
kill.sqlUsage @kill sid serial# (current instance only)
sessinfo.sqlUsage example: @sessinfo sid=xxx. Prints info on sessions that match the search pattern (RAC aware)
sql.sqlUsage: @sql sql_id. Prints sql text for given sql_id (RAC aware)

Workload Characterization Class
Script Name Description
ash_top.sqlUsage @ash_top n#_sec. Aggregates ash data over n#_sec seconds (current instance only)
ehm.sqlUsage example: @ehm 15 db%sequential. Event history metric data, helps finding skew in wait times, for example to study random reads latency skew (RAC aware)
ehm_local.sqlSingle-node version of ehm.sql: event history metric data, helps finding skew in wait events (current instance only)
eventmetric.sqlWait event metric details for top wait events (RAC aware)
filemetric.sqlFile metric details for top-usage files (RAC aware)
iometric.sql, iometric_details.sqlIO metric details (RAC aware)
jobs_running.sqlList running jobs details (RAC aware)
load.sqlDisplays OS-load values (RAC aware)
longops.sql, longops_10g.sql, longops_details.sqlInfo on long running operations from gv$sessoin_longops (RAC aware)
monitor.sql, monitor_detailsDisplay information from statements instrumented by sql monitoring
monitor_planUsage @monitor sql_monitor_key. Drill down from monitor.sql on the details of the execution plan with sql monitoring data per execution plan line
obj.sqlUsage: @obj object_id. Find details for a given object id, or data_object_id
sysmetric.sql, sysmetric_details.sqlSelected system metric details (RAC aware)
transactions.sqlList of open transactions and crash recovery (RAC aware)

Miscellaneous
Script Name Description
binds.sqlUsage: @binds sql_id. Helps in finding bind values for a given sql_id (RAC aware)
kill_select.sqlUsage example: @kill_select "sql_id='xxxx'". Helps in preparing scripts for killing large number of sessions (current instance only)
locks.sql, locks_10g.sql, locks_details.sqlScripts to help investigating locks and blocking enqueues issues (RACaware)
login.sqlSql*plus startup/config script for my environment (windows client version 11.2.0.3)

Conclusions: command-line scripts can be of great help for monitoring and  performance troubleshooting of Oracle databases. Of particular interest are queries to print the details of active sessions, the SQL they execute and their wait events details. This makes for a quick and easy entry point to troubleshooting and to get an idea of what's happening on the DB server. More systematic troubleshooting may be needed beyond that. Command-line scripts are easy to modify and customize, many DBAs indeed have their own tool-set. In this post I have exposed some of the scripts I have written and still find useful: download from this link.

A few links to end this post. First of all the excellent session snapper script by Tanel Poder. In addition a mention to a few cool tools that seem to be a sort of bridge (missing link? :) between command-line and graphical-interface approaches: MOATS by Tanel Poder and Adrian Billington , SQL Dashboard by Jagjeet Singh and topaas by Marcin Przepiorowski.

Thursday, August 16, 2012

How to Turn Off Adaptive Cursor Sharing, Cardinality Feedback and Serial Direct Read

Scope: New features and improvements on the SQL (cost-based) execution engine are great. However sometimes the need comes to turn some of those new features off. Maybe it's because of a bug that has appeared or simply because we just want consistency, especially after an upgrade. This is often the case when upgrading an application that has been tuned already with heavy usage of hints for critical SQL.The discussion of pros and cons of using hints is a very interesting topic but outside the scope of this entry.

When this is relevant: using a full set of hints (an outline) normally provides consistent SQL execution for a given Oracle version (and set of session/instance parameter values). However this is not guaranteed to be stable across upgrades. The reason is that additional functionality of the cost based optimization engine can play a role. This is indeed the case when moving from 10g to 11g and in particular to 11gR2. It's a good idea to investigate the new features and also to keep a note aside on how to disable them in case of need.

Three 11g features of cost-based optimization and SQL execution to consider when upgrading from 10g:


Cardinality feedback. This feature, enabled by default in 11.2, is intended to improve plans for repeated executions. See Support note 1344937.1 and documentation for additional info. You can be disable cardinality feedback with an underscore parameter which can be set at session or instance level. Usual caveats on setting underscore parameters apply:
alter session set "_OPTIMIZER_USE_FEEDBACK"=FALSE;
This can also be done with a hint, therefore at statement level, using the opt_param syntax: /*+ opt_param('_OPTIMIZER_USE_FEEDBACK','FALSE') */


Adaptive cursor sharing. This was introduced in 11gR1 to address SQL performance consistency issues related to bind variable peeking.  See support note 740052.1 and documentation for details. For the purposes of this doc, if you want to get the 10g behavior with  an Oracle version higher than 10g and overall if you want to disable this feature, you may want to add the hint:
/*+ NO_BIND_AWARE */. 
According to support note 11657468.8 adaptive cursor sharing can be disabled by setting the following 2 parameters (say at session level): _optimizer_adaptive_cursor_sharing = false_optimizer_extended_cursor_sharing_rel = "none"

Serial direct read. This feature, introduced in 11gR2, is about sql execution as not strictly speaking on cost-based optimization, it still makes sense to discuss it here as it can change the behavior of full scans: by forcing the use of direct read instead of 'normal cache-based access'. There are many interesting details to this feature that are outside the scope of this discussion. To revert to 10g behavior and overall to disable the feature you need to set the following parameter (there is no hint for this unfortunately, as this feature is not directly linked to CBO):
alter session set "_SERIAL_DIRECT_READ"=never;
Note, at the opposite side of the spectrum one can force serial direct reads with: alter session set "_serial_direct_read"=always;

Additional info and comments:


1. When testing Oracle version upgrades and facing regression of SQL, it can be useful to set cost-based optimization to behave as it did in the older version. For example when upgrading from 10.2.0.5 to 11.2.03 we can use the following hint on SQL we are testing: SQL hint/*+ OPTIMIZER_FEATURES_ENABLE('10.2.0.5') */ or set the parameter OPTIMIZER_FEATURES_ENABLE at session level (or in some cases even at system level).
However as discussed above, setting optimizer_features_enable parameter sometimes is not enough to ensure that a given SQL will execute in exactly the same way in the new system as it did in the older version. In particular related to the 3 examples above, adaptive cursor sharing and cardinality feedback are tuned off by reverting CBO compatibility to a 10g version, although the behavior of serial direct read is not affected by this hint (or any other hint for that matter).

2. How to get a full list of available hints? query V$SQL_HINT in Oracle 11.2 or higher

3a. Example of a logon trigger that can be used to set session-level parameters:

create or replace trigger change_session_parameters
AFTER LOGON ON mydbuser.SCHEMA --customize username, replace "mydbuser" with relevant user name
BEGIN
execute immediate 'alter session set "_serial_direct_read"=never';
END;
/

3b. Alternative trigger definition


create or replace TRIGGER trace_trig
AFTER LOGON
ON DATABASE
DECLARE
 sqlstr1 VARCHAR2(100) := 'alter session set "_serial_direct_read"=never';
BEGIN
  IF (USER IN ('USERNAME1', 'USERNAME2')) THEN -- customize user names as needed
    execute immediate sqlstr1;
  END IF;
END trace_trig;
/


4. Example of how to get the outline (full set of hints) for a given query:

select * from table(dbms_xplan.display_cursor('sql_id',null,'OUTLINE')); -- edit sql_id with actual value


Wednesday, July 18, 2012

Recursive Subquery Factoring, Oracle SQL and Physics

Introduction: this post  is about the use of recursive subquery factoring in Oracle to find numerical solutions to the equations of classical mechanics, for demonstration purposes. This technique will be introduced using 4 examples or growing complexity. In particular by the end of the post we will be able to compute the length of the (sideral) year using the laws of physics and SQL!

Recursive subquery factoring (aka recursive common table expressions) is a new feature of Oracle 11gR2 that has already been used for businness and for play: see this post on solving sudokus with Oracle and this other postThe computational strength of the feature comes from the fact that it makes recursion easily available from within SQL and so opens the door to natural ways of implementing several algorithms for numerical computation. 
Additional note, the amount of physics and maths used have been kept to a minimum and simplified where possible, hopefully without introducing too many mistakes in the process. Links are provided in the text for additional investigations on the topic. 

Example 1: projectile in gravity field


The motion is on 1 dimension, say the x axis. Our first task is to find a representation of the state of the system with a structure that can be computed in the database. A natural choice is to use the tuple (t, x, v, a). Where t is time, x is the position along the 1-dimensional axis of motion, v is the velocity and a is the acceleration.
Newton's second law (F=ma) gives us the rule of the game when we want to compute the state of the system subject to and external force. Using calculus this can be written as:  


Next we need a method to find numerical (approximate) solutions. We can use the Euler integration method, which basically means doing the following:

Another important point is that we need to know the initial conditions for the state tuple, in particular initial  position and velocity. In the example we will take x(t=0)=0 m and v(t=0)=100 m/sec. The force is the gravitational pull at the surface of the Earth: F=--m*g, where g=9.81 m/sec^2. Note the mass cancels out in our equations. This example models the motion of a projectile that we shoot vertically into the air and we observe as it rises up to about 500 meters and then falls back down, all this happens in about 20.5 seconds. The SQL used and a graph of the results are here below:

define dT=0.1
define g=9.81
define maxT=20

-- SQL to compute the motion of a projectile subject to gravity
WITH data(t,x,v,a) AS (
 SELECT cast(0 as binary_double) t, cast(0 as binary_double) x, cast (100 as binary_double) v, cast(-&g as binary_double) a FROM dual
 UNION ALL
 SELECT t+&dT, x+v*&dT, v+a*&dT, -&g FROM data WHERE t < &maxT
)
SELECT t,x,v,a FROM data;



Example 2: Harmonic oscillator


In this example we investigate the motion of a mass attached to a spring. We expect the mass to oscillate around a central point (x=0).
For greater accuracy in calculations we use a different integration method: the Verlet integration method (see also this link). The equation for the acceleration is: a = F/m =-K*x and initial conditions are: x(t=0)=1 m and v(t=0)=0 m/sec. Moreover we take K0.1 sec^-2See below the SQL used for the calculation and a graph of the results.

define dT=0.1  
define K=0.1
define maxT=20

-- SQL to compute the motion of a harmonic oscillator
WITH data(t,x,v,a) AS (
 SELECT cast(0 as binary_double) t, cast(1 as binary_double) x, cast (0 as binary_double) v, cast(-&K*1 as binary_double) a FROM dual
 UNION ALL
 SELECT t+&dT, x+v*&dT+0.5*a*&dT*&dT, v+0.5*(a-&K*(x+v*&dT))*&dT, -&K*(x+v*&dT+0.5*a*&dT*&dT) FROM data WHERE t < &maxT
)
SELECT t,x,v,a FROM data;




Example 3: Motion of the Earth around the Sun


The motion of the system Earth-Sun is a problem of 2 bodies moving in space. With a 'standard trick' this can be reduced to a problem of 1 body, and 2 spatial variables, which represent the (vector) distance of the Earth from the Sun in the place of the orbit. Our description of the system will use the following tuple: (t,x,vx,ax,y,vy,ay), that is time, position, velocity and acceleration for a 2-dimensional problem in the (x,y) plane. The equation for the force is Newton's law of universal gravitationAnother important point again is to use the correct initial conditions. These can be taken from astronomical observations, x(t=0)=152098233 Km (also know as the aphelion point) and v=29.291 Km/sec (credits to this link and this other link). We will use again the Verlet integration method as in example 2 above. Note, for ease of computation, time and space will be re-scaled so that t=1 means 1000 sec and x=1 means 1 Km (same is true for the y axis). The SQL used is pasted here below as well as a graph of the computed results, that is our approximate calculation of the Earth's orbit

-- length unit = 1 km
-- time unit = 1000 sec
define dT=.1
define aph=152098233
define GM=132712838618000000
-- note this is GM Sun + Earth (reduced mass correction)
define v_aph=29291
define maxT=40000

-- SQL to compute the trajectory of the Earth around the Sun
WITH data(step, t,x,y,vx,vy,ax,ay) AS (
 SELECT 0 step, cast(0 as binary_double) t, cast(&aph as binary_double) x, cast(0 as binary_double) y,
        cast(0 as binary_double) vx, cast(&v_aph as binary_double) vy,
        cast(-&GM/power(&aph,2) as binary_double) ax, cast(0 as binary_double) ay  FROM dual
 UNION ALL
 SELECT step+1, t+&dT, x+vx*&dT+0.5*ax*&dT*&dT, y+vy*&dT+0.5*ay*&dT*&dT,
        vx+0.5*(ax-&GM*(x+vx*&dT)/power(x*x+y*y,1.5))*&dT,
        vy+0.5*(ay-&GM*(y+vy*&dT)/power(x*x+y*y,1.5))*&dT,
        -&GM*(x+vx*&dT+0.5*ax*&dT*&dT)/power(power(x+vx*&dT,2)+power(y+vy*&dT,2),1.5),
        -&GM*(y+vy*&dT+0.5*ay*&dT*&dT)/power(power(x+vx*&dT,2)+power(y+vy*&dT,2),1.5)
 FROM data WHERE t < &maxT
)
SELECT t,x,y,vx,vy,ax,ay FROM data WHERE mod(step,100)=0; -- output only one point every 100



Example 4: Compute the length of the sideral year using the equations of motion of the Earth around the Sun and SQL


As a final example and an 'application' of the techniques above we can use SQL to compute the number of days year (or rather in a sideral year, see the link for additional details). We find a result that is in agreement with measurements with 6 significant digits. This is an interesting result considering that it is obtained with just a few lines of SQL!

-- This is a wrapper on the equation discussed in example 2 above
-- SQL to compute the period of revolution of the Earth around the Sun, that is the number of seconds in 1 sideral year

select round(t*1000/(3600*24),3) days_in_sideral_year  from (
 WITH data(step, t,x,y,vx,vy,ax,ay) AS (
  SELECT 0 step, cast(0 as binary_double) t, cast(&aph as binary_double) x, cast(0 as binary_double) y,
        cast(0 as binary_double) vx, cast(&v_aph as binary_double) vy,
        cast(-&GM/power(&aph,2) as binary_double) ax, cast(0 as binary_double) ay  FROM dual
  UNION ALL
  SELECT step+1, t+&dT, x+vx*&dT+0.5*ax*&dT*&dT, y+vy*&dT+0.5*ay*&dT*&dT,
        vx+0.5*(ax-&GM*(x+vx*&dT)/power(x*x+y*y,1.5))*&dT,
        vy+0.5*(ay-&GM*(y+vy*&dT)/power(x*x+y*y,1.5))*&dT,
        -&GM*(x+vx*&dT+0.5*ax*&dT*&dT)/power(power(x+vx*&dT,2)+power(y+vy*&dT,2),1.5),
        -&GM*(y+vy*&dT+0.5*ay*&dT*&dT)/power(power(x+vx*&dT,2)+power(y+vy*&dT,2),1.5)
  FROM data WHERE t < &maxT
 )
 SELECT step,t,y,lead(y) over(order by step) next_y FROM data
) where y<0 and next_y>0;


DAYS_IN_SIDERAL_YEAR
--------------------
             365.256


Conclusions:

We have discussed in 4 examples the usage of Oracle SQL and in particular of recursive subquery factoring, applied to solve selected cases of differential equations of classical mechanics. Despite the simplifications and approximations involved, the technique has allowed us to compute the length of the sideral year with good precision. This method can be extended to make calculation for more complex systems, such as systems of many particles.