Showing posts with label techfun. Show all posts
Showing posts with label techfun. Show all posts

Saturday, August 29, 2015

Add Color to Your SQL

Topic: this post is about some simple techniques to add color to SQL scripts and their terminal output using ANSI escape codes.

Colors can be used to improve the output of command line tools. This is common practice, for example with the (bash) shell. Colors can also be very useful to improve the quality of the output of SQL scripts. In my experience this is not used frequently, probably because of the the need of specialized techniques and also because the results depend on the terminal emulator (or tool) used to display the output. In this post you will find some pointers and examples that you can use to add color to your SQL output.

Notable previous work

Tanel Poder has published a couple of years ago his very cool logo to the great snapper v4 script using SQL and color-rich terminal output.
Sayan Malakshinov has published a blog article and a script color.sql that provide some ready-to-use and simple techniques for coloring the output of Oracle SQL*plus scripts.
ANSI escape codes are the main underlying technique to add color to the terminal output, see more details on how this works at: Wikipedia article on ANSI escape codes.
Putty is a widely usedterminal emulator that support ANSI escape calls. If you are a Windows user, note that CMD.EXE does not support ANSI escape codes, therefore it will not be suitable to run the scripts described in this post.
Heat map visualization is a powerful technique to explore 3D data, by providing the third dimension as color. I have integrated heat map visualization and command line/ terminal output with two tools OraLatencyMap and PyLatencyMap aimed at the study of I/O latency. I will share in the next paragraph some of the tips and lessons learned from developing those tools regarding the use of color on the terminal output.

Color palettes by example

Color palettes are very useful for heat map visualization. I have identified two simple palettes for displaying I/O latency histograms on terminal output: the first one is composed of shades of blue, the other is yellow-to-red. See an example of their usage as heat maps at this link. The scripts Color_palette_blue.sql and Color_palette_yellow-red.sql show two basic examples of how to generate color palettes using ANSI escape codes. The SQL, also pasted here below, works simply by changing the character background color, printing a white space and finally resetting the background back to normal:

define ANSICODE_PREFIX="chr(27)||'[48;5;'"
define ANSICODE_BACKTONORMAL="chr(27)||'[0m'"

select 0 ID, &ANSICODE_PREFIX|| '0m '|| &ANSICODE_BACKTONORMAL COLOR from dual
UNION ALL  -- Black
select 1 ID, &ANSICODE_PREFIX|| '15m '|| &ANSICODE_BACKTONORMAL COLOR from dual  
UNION ALL  -- White
select 2 ID, &ANSICODE_PREFIX|| '51m '|| &ANSICODE_BACKTONORMAL COLOR from dual  
UNION ALL  -- Light blue
select 3 ID, &ANSICODE_PREFIX|| '45m '|| &ANSICODE_BACKTONORMAL COLOR from dual
UNION ALL 
select 4 ID, &ANSICODE_PREFIX|| '39m '|| &ANSICODE_BACKTONORMAL COLOR from dual  
UNION ALL 
select 5 ID, &ANSICODE_PREFIX|| '33m '|| &ANSICODE_BACKTONORMAL COLOR from dual
UNION ALL 
select 6 ID, &ANSICODE_PREFIX|| '27m '|| &ANSICODE_BACKTONORMAL COLOR from dual
UNION ALL  -- Dark blue 
select 7 ID, &ANSICODE_PREFIX|| '21m '|| &ANSICODE_BACKTONORMAL COLOR from dual; 

The generation of the ANSI escape codes has little complexity and can be ported with little effort to many other language of interest. Here an example in PL/SQL (from OraLatencyMap):

create or replace function asciiescape_color (p_token pls_integer, p_palette_type varchar2) 
return varchar2 
   is
      type t_palette is varray(7) of pls_integer;        -- a palette of 7 colors
      v_palette_blue t_palette := t_palette(15,51,45,39,33,27,21);      -- shades of blue
      v_palette_red t_palette := t_palette(15,226,220,214,208,202,196); -- white-yellow-red
      v_colornum pls_integer;
   begin
      if ((p_token < 0) or (p_token > 6)) then
          raise_application_error(-20001,'The color palette has 7 colors, 0<=p_token<=6, found instead:'||to_char(p_token));
      end if; 

      if (p_palette_type = 'blue') then
          v_colornum := v_palette_blue(p_token+1);                               
      else
          v_colornum := v_palette_red(p_token+1);                                 
      end if;
      return(chr(27)||'[48;5;'||to_char(v_colornum)||'m');   
      --return the ascii escape sequence to change background color 
end;
/

An example in Python (from PyLatencyMap):

def asciiescape_color(token, palette):
    blue_palette = {0:15, 1:51, 2:45, 3:39, 4:33, 5:27, 6:21}        # palette, shades of blue 
    red_palette  = {0:15, 1:226, 2:220, 3:214, 4:208, 5:202, 6:196}  # white-yellow-red palette
    if palette == 'blue':
       color_asciival = blue_palette[token]
    elif palette == 'red':
       color_asciival = red_palette[token]
    else:
       raise Exception('Wrong or missing palette name.')
       exit(1)
    return(chr(27) + '[48;5;' + str(color_asciival) + 'm')


Other ANSI escape codes of interest from OraLatencyMap and PyLatencyMap are the codes to restore the cursor back to the normal value and to clear the screen. Here is an example from PyLatencyMap (Python):

#reset the background color back to normal
asciiescape_backtonormal = chr(27) + '[0m'

# clear screen and move cursor to top line
line += chr(27) + '[0m' + chr(27) + '[2J' + chr(27) + '[H' 

An example of colorizing SQL output

You will see in this paragraph an example that I hope is both instructive and fun: how to add colors to a script that computes an image of the Mandelbrot set. The starting script is quite interesting by itself as it uses just SQL for computation and output display. The code is not original, I have ported it to Oracle from code on the PostgreSQL wiki, with some minor modifications. Mandelbrot_SQL_Oracle_text.sql is the "black and white" starting starting script before adding color.
Adding colors to the output for this example is just a matter of combining the original "black and white" script with the SQL scripts for generating color palettes. The results are the the following two scripts (see also the figure below for an example of their output):

The figure illustrates how to add color to some classes of SQL output. The examples use a test script which generates an image of the Mandelbrot set using SQL. Note that adding color to the plain text version is achieved here just by adding an extra table join (with the PALETTE common table expression). Other methods to add color to Oracle scripts include using PL/SQL functions or the use of other languages, languages, such as Python, as discussed in the text of this post.

A version of the script for PostgreSQL that uses the ideas discussed in this post for colorizing SQL is: Mandelbrot_SQL_PostgreSQL_color_blue.sql

Conclusions

Colors can improve the effectiveness of command line scripts and terminal output, including SQL scripts. ANSI escape codes provide a powerful tool for many terminal operations. Heat map visualization is a powerful data visualization technique that can be implemented also on the terminal output using ANSI escape codes. In this post you can find simple tips on how to add colors to the terminal output both for SQL and other languages, notably Python. Adding color to "black and white" script output can be fun and useful at the same time, as illustrated with the Mandelbrot SQL example. Happy coloring!

Download: The tools discussed in this post can be downloaded from this webpage and from Github

Additional links and references

Tanel's colored fished logo to the snapper v4 script
Sayan Malakshinov's blog article and script color.sql
Wikipedia on ANSI escape codes
Latency heat maps for I/O latency measurements on the CLI with OraLatencyMap and PyLatencyMap
Fun SQL snippets from the PostgreSQL wiki
Additional examples of recursive common table expression (recursive subquery factoring) and non-standard uses of SQL: how to find numeric solutions to basic physics equations using SQL



Wednesday, July 18, 2012

Recursive Subquery Factoring, Oracle SQL and Physics

Introduction: this post  is about the use of recursive subquery factoring (recursive common table expressions) in Oracle to find numerical solutions to the equations of classical mechanics, for demonstration purposes. This technique will be introduced using 4 examples or growing complexity. In particular by the end of the post we will be able to compute the length of the (sidereal) year using the laws of physics and SQL.

Recursive subquery factoring (aka recursive common table expressions) is a feature first introduced in Oracle version 11gR2 that has been used by many both for business and for play. See for example this post on Jonathan Lewis blogthis post on solving sudokus and this about the Mandelbrot setThe computational strength of the feature comes from the fact that it makes recursion easily available from within SQL and so opens the door to natural ways of implementing several algorithms for numerical computation. 
Additional note: the amount of physics and maths used have been kept to a minimum and simplified where possible, hopefully without introducing too many mistakes in the process. 

Example 1: projectile in gravity field


The motion is on 1 dimension, say the x axis. Our first task is to find a representation of the state of the system with a structure that can be computed in the database. A natural choice is to use the tuple (t, x, v, a). Where t is time, x is the position along the 1-dimensional axis of motion, v is the velocity and a is the acceleration.
Newton's second law (F=ma) gives us the rule of the game when we want to compute the state of the system subject to and external force. Using calculus this can be written as:  


Next we need a method to find numerical (approximate) solutions. We can use the Euler integration method, which basically means doing the following:

Another important point is that we need to know the initial conditions for the state tuple, in particular initial  position and velocity. In the example we will take x(t=0)=0 m and v(t=0)=100 m/sec. The force is the gravitational pull at the surface of the Earth: F=--m*g, where g=9.81 m/sec^2. Note the mass cancels out in our equations. This example models the motion of a projectile that we shoot vertically into the air and we observe as it rises up to about 500 meters and then falls back down, all this happens in about 20.5 seconds. The SQL used and a graph of the results are here below:

define dT=0.1
define g=9.81
define maxT=20

-- SQL to compute the motion of a projectile subject to gravity
WITH data(t,x,v,a) AS (
 SELECT cast(0 as binary_double) t, cast(0 as binary_double) x, cast (100 as binary_double) v, cast(-&g as binary_double) a FROM dual
 UNION ALL
 SELECT t+&dT, x+v*&dT, v+a*&dT, -&g FROM data WHERE t < &maxT
)
SELECT t,x,v,a FROM data;



Example 2: Harmonic oscillator


In this example we investigate the motion of a mass attached to a spring. We expect the mass to oscillate around a central point (x=0).
For greater accuracy in calculations we use a different integration method: the Verlet integration method (see also this link). The equation for the acceleration is: a = F/m =-K*x and initial conditions are: x(t=0)=1 m and v(t=0)=0 m/sec. Moreover we take K0.1 sec^-2See below the SQL used for the calculation and a graph of the results.

define dT=0.1  
define K=0.1
define maxT=20

-- SQL to compute the motion of a harmonic oscillator
WITH data(t,x,v,a) AS (
 SELECT cast(0 as binary_double) t, cast(1 as binary_double) x, cast (0 as binary_double) v, cast(-&K*1 as binary_double) a FROM dual
 UNION ALL
 SELECT t+&dT, x+v*&dT+0.5*a*&dT*&dT, v+0.5*(a-&K*(x+v*&dT))*&dT, -&K*(x+v*&dT+0.5*a*&dT*&dT) FROM data WHERE t < &maxT
)
SELECT t,x,v,a FROM data;




Example 3: Motion of the Earth around the Sun


The motion of the system Earth-Sun is a problem of 2 bodies moving in space. With a 'standard trick' this can be reduced to a problem of 1 body, and 2 spatial variables, which represent the (vector) distance of the Earth from the Sun in the place of the orbit. Our description of the system will use the following tuple: (t,x,vx,ax,y,vy,ay), that is time, position, velocity and acceleration for a 2-dimensional problem in the (x,y) plane. The equation for the force is Newton's law of universal gravitationAnother important point again is to use the correct initial conditions. These can be taken from astronomical observations, x(t=0)=152098233 Km (also know as the aphelion point) and v=29.291 Km/sec (credits to this link and this other link). We will use again the Verlet integration method as in example 2 above. Note, for ease of computation, time and space will be re-scaled so that t=1 means 1000 sec and x=1 means 1 Km (same is true for the y axis). The SQL used is pasted here below as well as a graph of the computed results, that is our approximate calculation of the Earth's orbit

-- length unit = 1 km
-- time unit = 1000 sec
define dT=.1
define aph=152098233
define GM=132712838618000000
-- note this is GM Sun + Earth (reduced mass correction)
define v_aph=29291
define maxT=40000

-- SQL to compute the trajectory of the Earth around the Sun
WITH data(step, t,x,y,vx,vy,ax,ay) AS (
 SELECT 0 step,cast(0 as binary_double) t,cast(&aph as binary_double) x,cast(0 as binary_double) y,
        cast(0 as binary_double) vx, cast(&v_aph as binary_double) vy,
        cast(-&GM/power(&aph,2) as binary_double) ax, cast(0 as binary_double) ay  FROM dual
 UNION ALL
 SELECT step+1, t+&dT, x+vx*&dT+0.5*ax*&dT*&dT, y+vy*&dT+0.5*ay*&dT*&dT,
        vx+0.5*(ax-&GM*(x+vx*&dT)/power(x*x+y*y,1.5))*&dT,
        vy+0.5*(ay-&GM*(y+vy*&dT)/power(x*x+y*y,1.5))*&dT,
        -&GM*(x+vx*&dT+0.5*ax*&dT*&dT)/power(power(x+vx*&dT,2)+power(y+vy*&dT,2),1.5),
        -&GM*(y+vy*&dT+0.5*ay*&dT*&dT)/power(power(x+vx*&dT,2)+power(y+vy*&dT,2),1.5)
 FROM data WHERE t < &maxT
)
SELECT t,x,y,vx,vy,ax,ay FROM data WHERE mod(step,100)=0; -- output only one point every 100




Example 4: Compute the length of the sidereal year using the equations of motion of the Earth around the Sun and SQL


As a final example and an 'application' of the techniques above we can use SQL to compute the number of days year (or rather in a sidereal year, see the link for additional details). We find a result that is in agreement with measurements with 6 significant digits. This is an interesting result considering that it is obtained with just a few lines of SQL!

-- SQL to compute the number of days in one sidereal year
-- A sidereal year is the time taken by the Earth to orbit
-- the Sun once with respect to the fixed stars.
-- This builds on the equation and SQL discussed in example N.3
define dT=.1
define aph=152098233
define GM=132712838618000000
define v_aph=29291
define maxT=40000

select round(t*1000/(3600*24),3) days_in_a_sidereal_year  from (
 WITH data(step, t,x,y,vx,vy,ax,ay) AS (
  SELECT 0 step,cast(0 as binary_double) t,cast(&aph as binary_double) x,cast(0 as binary_double) y,
        cast(0 as binary_double) vx, cast(&v_aph as binary_double) vy,
        cast(-&GM/power(&aph,2) as binary_double) ax, cast(0 as binary_double) ay  FROM dual
  UNION ALL
  SELECT step+1, t+&dT, x+vx*&dT+0.5*ax*&dT*&dT, y+vy*&dT+0.5*ay*&dT*&dT,
        vx+0.5*(ax-&GM*(x+vx*&dT)/power(x*x+y*y,1.5))*&dT,
        vy+0.5*(ay-&GM*(y+vy*&dT)/power(x*x+y*y,1.5))*&dT,
        -&GM*(x+vx*&dT+0.5*ax*&dT*&dT)/power(power(x+vx*&dT,2)+power(y+vy*&dT,2),1.5),
        -&GM*(y+vy*&dT+0.5*ay*&dT*&dT)/power(power(x+vx*&dT,2)+power(y+vy*&dT,2),1.5)
  FROM data WHERE t < &maxT
 )
 SELECT step,t,y,lead(y) over(order by step) next_y FROM data
) where y<0 and next_y>0;


DAYS_IN_A_SIDEREAL_YEAR
-----------------------
                365.256


Conclusions:

We have discussed in 4 examples the usage of Oracle SQL and in particular of recursive subquery factoring (recursive common table expressions), applied to solve selected cases of differential equations of classical mechanics. Despite the simplifications and approximations involved, the technique has allowed us to compute the length of the sidereal year with good precision. This method can be extended to make calculation for more complex systems, such as systems of many particles.

Monday, June 11, 2012

Hash Collisions in Oracle: SQL Signature and SQL_ID

Topic: Discussion and implementation of a method for finding hash collisions for sql_id and SQL signature in Oracle.

Introduction:

MD5 hashing is used by Oracle to compute sql_id and SQL signature (see also a previous blog post on the topic). Those hash values are normally used as if they were a unique representation of a given SQL statement. However collisions (2 different statements with the same hash value) can happen or, as it is the case of this post, will happen!

The underlying math is a short discussion of the birthday attack: Let's say you walk into a room an meet 30 people: shake hand with the first person you see and also disclose each other's the birthday. There is about 1 chance out 365 that that particular person has the same birthday as you. If you repeat this process with the rest of the 30 people in the room, chances of a 'positive event' increase but still stay below 10%. However when all the people in the room repeat the procedure of shaking hands and sharing birthday details, probabilities add up quickly (there are N(N-1)/2 different handshakes possible in a room of N people). The non-intuitive result is that it is likely that at least 2 people share a common birthday already in such a small group. See the link for the details.

The main point is that to find collisions by brute force on a hash function of n bits we don't need to generate O(2^n) hash values but rather the square root of that O(2^n/2). In particular this means that the complexity of finding hash collisions for 32-bit and 64-bit hash functions by brute force are down to numbers that are easily computable with current HW. sql_id and signatures are indeed 64-bit hashes, as discussed here, so let's give it a try!

Warm-up, collisions on hash_value:

The hash_value of a a given SQL statement (as shown in V$SQL.hash_value for example) is a 32-bit hash. Collisions can be found using about 100K random hashes. A simple algorithm using Oracle and PL/SQL is:
  • Take a couple of different SQL statements and append to them a randomly-generated string
  • Compute the MD5 hash and take the number of bytes that are needed (4 bytes in this case)
  • Insert the hashes in a table
  • Repeat ~100K times
  • When finished look in the output table for duplicates in the hash value
  • If duplicates are found, those are the hash_value collisions
The code can be found at this link. The execution takes just a few seconds and displays the collisions that have been found (results varies for each execution, normally 2-5 collisions are found for this size of search).

SQL> @hash_birthday_sql hashtab 10 4
SQL> @find_dup hashtab

-- these 2 statements are different but have the same hash_value (=3577689134)
SQL> select owner, index_name from dba_indexes where rownum<=9 --CTUUewnYYPJXFuFyAzcMtXTVNmPtNgAV;
SQL> select owner, table_name from dba_tables where rownum<=10 --ooXUUTNfekKkRZGlCJfiPlxKcoCfpDAh;

Scaling up to 64-bit hash: sql_id collisions

The same algorithm used for hash_value collisions can be used for sql_id, although we need to scale up the calculation: the hash is 8 byte long now and we need ~10 billion random hashes to find a few meaningful collisions. SQL> @hash_birthday_sql hashtab 500000 8 would take about 60 hours to do such calculation (mileage may vary). A faster way is to split the execution in smaller chunks and parallelize  the execution across multiple CPU and possibly RAC nodes. See the link here for an example on 2 RAC nodes and parallelism 10 each.

Results: the following two SQL statements are different but have the same sql_id (ayr58apvbz37z) and hash_value (1992264959).

SQL> select owner, index_name from dba_indexes where rownum<=9 --BaERRzEYqyYphBAvEbIrbYYDKkemLaib;
SQL> select owner, table_name from dba_tables where rownum<=10 --XhiidvehudXqDpCMZokNkScXlQiIUkUq;

--confirm this by running:
SQL> select sql_text from v$sql where sql_id='ayr58apvbz37z';

Another 64 bit hash, SQL signature collisions

A variation from the theme above is to calculate statements that have a SQL signature collision. The signature is used by SQL profiles, baselines and SQL patches. It is calculated in a similar but different way than sql_id, in particular SQL text is normalized before hashing (see this link for additional details). The base script is hash_birthday_signature.sql again parallel computation can be used to speed up the computation.

Results: the following two SQL statements are different but have the same signature (6284201623763853025), note sql_id values are different.

SQL> SELECT TABLE_NAME FROM USER_TABLES --JO7HWQAPNFYEALIOA3N5ODTR46INWY6N;
SQL> SELECT INDEX_NAME FROM USER_INDEXES --PAOZY8VWGDHDS9NICVREGTR22J6C24DQ;

--confirm this by running:
select sql_text from v$sql where force_matching_signature=6284201623763853025;

How does Oracle cope with collisions

In a few tests I have seen dbms_xplan throwing ORA-01422 when displaying plans for sql_id with collisions. There may be other packages/tools that make the assumption that sql_id values don't have collisions and they may throw errors.
These are a few more problems with SQL signature collisions. In particular a source of potential trouble is that one of the main (and common) underlying tables for  SQL plan management (baselines, profiles, patches) which stores the mappings of sql_text to signature, SYS.SQL$TEXT, has a unique index on signature. However we have seen that it is quite unlikely to have a signature collision just by pure chance (i.e. in the normal usage of the DB).

Conclusions:

This post reports on an exercise to compute collisions for sql_id and SQL signatures of Oracle statements using the Oracle database and PL/SQL. We have seen how some simple math, called birthday attack, can be of help and how an implementation of this in Oracle is quite easy up to hash length of 64 bit. We also report on the findings, notably collisions of sql_id and SQL signature.

Additional comments: a collision on full_hash_value (as reported in v$db_object_cache) could not be calculated with this method as it is a 128 bit hash and the brute force method would require resources out of a 'reasonable scale' . However, since 2005 there are methods known to find collisions for MD5 hashes on the full 128 bits that use weaknesses in the algorithm (that's why MD5 should not be used for security purposes any more). Those methods could be used to find full_hash_value collision.

PS: SQL and scripts mentioned in this article can be found following this link