Tuesday, June 5, 2012

SQL Signature, Text Normalization and MD5 Hash

Topic: A discussion on how Oracle computes SQL signature using MD5 hashing and on how SQL text is normalized before computing SQL signatures.

Introduction and warm-up:

SQL statements that are cached in the library cache are associated to a hash value, which is visible in various forms across a few V$ views. In 11gR2 for example: V$DB_OBJECT_CACHE.FULL_HASH_VALUE, V$SQL.SQL_ID, V$SQL.HASH_VALUE.

From the work of Tanel, Marcin and Slavik we learn that the full hash is an md5 hash in hexadecimal, sql_id is a base-32 representation of the trailing 8 bytes of the same md5 hash and finally the good old hash_value is a decimal representation of the trailing 4 bytes of the md5 hash.
Here below an example to illustrate how one can easily reproduce Oracle's hash values from md5 calculations (see also the blog entries mentioned above for more details):

--11gR2 hash from library cache, in 10g use x$kglob.kglnahsv  
SQL> select hash_value, to_char(hash_value,'xxxxxxxx') hex_hash_value, full_hash_value from GV$DB_OBJECT_CACHE where name='select 1 from dual';

----------- -------------- --------------------------------
2866845384  aae096c8       7d4dc9b423f0bcfb510272edaae096c8

The same  value can be calculated with md5_with_chr0.sql, a simple script based on DBMS_OBFUSCATION_TOOLKIT.MD5. Note chr(0) is added to the sql text before hashing.

SQL> @md5_with_chr0 'select 1 from dual'

7d4dc9b423f0bcfb510272edaae096c8 --> same value as full_hash_value above

SQL signature calculation and SQL text normalization

SQL signature is another type of hashing for SQL statements used by Oracle to support the features of plan management with SQL profiles, SQL patches, SQL plan baselines.

1) SQL statements are 'normalized'. Oracle performs transformations to the text before calculating the signature. Among others extra spaces are removed and case is ignored (see also oracle doc on V$SQL). An example here below to illustrate this: 2 SQL statements that differ only on case and additional whitespaces are shown to have the same SQL signature. Spoiler: normalization of more complex statements has some additional surprises, see below.

SQL> select DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE('select  1  from   dual') signature_with_space, DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE('SELECT 1 FROM DUAL') signature_normalized_sql from dual;

-------------------- ------------------------
12518811395313535686     12518811395313535686

2) Force matching is an additional option/feature of SQL signatures. It is used to match the text of SQL that uses literals. Oracle performs an additional transformation, that is literals are transformed into system-generated names for bind variables before hashing. An example below using literals (the value 1). The signature is converted in hexadecimal in this example, as we will use that for the following.

select to_char(DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE('SELECT 1 FROM DUAL'),'xxxxxxxxxxxxxxxx') exact_match_signature_hex,
       to_char(DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE('SELECT 1 FROM DUAL',force_match=>1),'xxxxxxxxxxxxxxxx') force_match_signature_hex from dual;

------------------------- -------------------------
 adbbc0a2f3c68ac6         9289f992520d5a86

3) Computing SQL signatures directly with MD5. The script md5.sql  computes md5 hashes of the input (note it does not add a trailing chr(0)). We can reproduce the values obtained before and therefore confirm what is the transformation that Oracle does when calculating signatures.


d9f6df623a086c51adbbc0a2f3c68ac6  -> see exact_match in (2)above


a995b29240a442869289f992520d5a86  -> see force_match in (2) above

4) Additional investigations: when we have lists of columns or tables Oracle transforms ",", in " , " so actually in this case whitespaces may be added to the original statement and not removed as we have seen in point (1) above! An example here below:

SQL> select to_char(DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE('SELECT ID,ID FROM DUAL,DUAL',force_match=>1),'xxxxxxxxxxxxxxxx') force_match_signature_hex from dual;





The signature of SQL statements is calculated by Oracle using MD5 on the normalized SQL text. Normalization steps include removing extra white-space characters, converting the text to upper case. Additional normalization steps concern comma-separated list of objects: the item separator is  transformed into ' , ' (space, comma, space).
The MD5 hash calculated on the normalized SQL text is truncated to the last 8 bytes and is displayed in V$ views as a decimal number (SQL signature).
In the case of force matching signatures literals are additionally replaced with:"SYS_B_<n>" (where <n> is a decimal number, see example above), very much like the case when using cursor_sharing=force.
Note: a trailing chr(0) is added by Oracle when calculating hash_values/sql_id but not when calculating SQL signatures.

md5.sql and md5_with_chr0.sql scripts can be found at this link.

No comments:

Post a Comment