Wednesday, September 26, 2012

On Command-Line DBA

Topic: A review of the command-line monitoring scripts that I currently use for Oracle.

Command-line is still very useful for Oracle database administration, troubleshooting and performance monitoring. One of the first thins that I do when I want to work with a given Oracle database is to connect as a privileged user with Sql*plus and get information on the active sessions with a script. I will run the script a few times to get an idea of what's happening 'right now' and get a possible starting point for more systematic tuning/troubleshooting if needed.
I liken this approach to examining a few pictures from a remote webcam and trying to find out what's happening out there!.. The method can be powerful, although with some limitations. One of its advantages is that it is easy to implement in Oracle! One of the main source of information for this type of monitoring is GV$SESSION, which is available in Oracle since ages. More recently GV$ACTIVE_SESSION_HISTORY has proven very useful and I quite like the additional improvements introduced there in11gR2.

Examples: One of my favorites is a script I have called top.sql (not very much an original name I know :) which queries gv$session (RAC-aware) and prints out what I currently find the most interesting details (for my environment) on active sessions that I can fit in roughly 200 characters. Another script that does a similar job taking data from v$active_session_history is ash.sql together with 'its brothers' ash2.sql and ash4.sql which are meant for RAC of 2 and 4 nodes respectively.


One of the points that I have found as most time-consuming when writing such monitoring scripts has to do with deciding which information I wanted to display and how to pack it in a relatively short line. I have also found that with each major Oracle versions I want to refresh my scripts with a few changes. I have recently done that when moving from 10g to 11gR2.

Notable info displayed by top.sql: SQL_DT=calculated as round((sysdate-sql_exec_start)*24*3600,1); CALL_DT=last_call_et; W_dT=seconds in wait (if waiting) or else seconds since last wait; OBJ#=object number currently under operation, TR=has the session an open transanction?

Notable info displayed by ash.sql: EXEC_PLAN_LN#_OBJ#=which step of the execution plan is the session executing? at which line and on which object?; "DB%,CPU" percentage active in terms of db time and cpu time; "R,W_IOP"=read and write IOPS; "R,W_MBP"= read and write throughput in MB/s;  "PGA,TEMP_"=pga and temporary space usage.

Scripts details: I have packaged some of the scripts that I have written and that I find useful in a zip file which is available at this link (see there for the most recent version of the Monitoring Scripts file).  Note that the scripts are tested on 11gR2, some of them will also work on 10g.

Monitoring Class
Script Name Description
top.sqlList with selected details of the active sessions, from gv$session (RAC aware)
top_10g.sqlActive sessions details, from gv$session, 10g version  (RAC aware)
ash.sqlList with selected details of the active sessions, from v$active_session_history (one instance only)
ash_h.sqlUsage @ash_h n#_sec. Active sessions details for the last n#_sec seconds, from v$active_session_history (current instance only)
ash2.sql, ash4.sqlActive sessions details, from gv$active_session_history (for RAC 2 and 4 nodes)

Drill-down Class
Script Name Description
ash_sess.sqlUsage: @ash_sess inst_id sid n#_sec. Prints details from gv$active_session_history for a given session (RAC aware)
explain.sqlUsage: @explain sql_id. Prints detailed explain plan info for given sql_id from libary cache (current instance only)
explain_awr.sqlUsage: @explain_awr sql_id. Prints detailed explain plan info for given sql_id from AWR (rac aware)
kill.sqlUsage @kill sid serial# (current instance only)
sessinfo.sqlUsage example: @sessinfo sid=xxx. Prints info on sessions that match the search pattern (RAC aware)
sql.sqlUsage: @sql sql_id. Prints sql text for given sql_id (RAC aware)

Workload Characterization Class
Script Name Description
ash_top.sqlUsage @ash_top n#_sec. Aggregates ash data over n#_sec seconds (current instance only)
ehm.sqlUsage example: @ehm 15 db%sequential. Event history metric data, helps finding skew in wait times, for example to study random reads latency skew (RAC aware)
ehm_local.sqlSingle-node version of ehm.sql: event history metric data, helps finding skew in wait events (current instance only)
eventmetric.sqlWait event metric details for top wait events (RAC aware)
filemetric.sqlFile metric details for top-usage files (RAC aware)
iometric.sql, iometric_details.sqlIO metric details (RAC aware)
jobs_running.sqlList running jobs details (RAC aware)
load.sqlDisplays OS-load values (RAC aware)
longops.sql, longops_10g.sql, longops_details.sqlInfo on long running operations from gv$sessoin_longops (RAC aware)
monitor.sql, monitor_detailsDisplay information from statements instrumented by sql monitoring
monitor_planUsage @monitor sql_monitor_key. Drill down from monitor.sql on the details of the execution plan with sql monitoring data per execution plan line
obj.sqlUsage: @obj object_id. Find details for a given object id, or data_object_id
sysmetric.sql, sysmetric_details.sqlSelected system metric details (RAC aware)
transactions.sqlList of open transactions and crash recovery (RAC aware)

Miscellaneous
Script Name Description
binds.sqlUsage: @binds sql_id. Helps in finding bind values for a given sql_id (RAC aware)
kill_select.sqlUsage example: @kill_select "sql_id='xxxx'". Helps in preparing scripts for killing large number of sessions (current instance only)
locks.sql, locks_10g.sql, locks_details.sqlScripts to help investigating locks and blocking enqueues issues (RACaware)
login.sqlSql*plus startup/config script for my environment (windows client version 11.2.0.3)

Conclusions: command-line scripts can be of great help for monitoring and  performance troubleshooting of Oracle databases. Of particular interest are queries to print the details of active sessions, the SQL they execute and their wait events details. This makes for a quick and easy entry point to troubleshooting and to get an idea of what's happening on the DB server. More systematic troubleshooting may be needed beyond that. Command-line scripts are easy to modify and customize, many DBAs indeed have their own tool-set. In this post I have exposed some of the scripts I have written and still find useful: download from this link.

A few links to end this post. First of all the excellent session snapper script by Tanel Poder. In addition a mention to a few cool tools that seem to be a sort of bridge (missing link? :) between command-line and graphical-interface approaches: MOATS by Tanel Poder and Adrian Billington , SQL Dashboard by Jagjeet Singh and topaas by Marcin Przepiorowski.

No comments:

Post a Comment