DW TIP: Get next or previous value with SQL Analytic Function [EN-US; PLSQL]

While querying our databases we might face a typical problema which is getting the next or the previous value of an atribute according to some rule applyed to a certain dataset. This happens more if you deal with datawarehouses and need to retrieve this kind of analytical information. The typical solutions involve several “group by” and sub queries to achieve the same result. This way SQL has a powerful feature  which are the Analytical functions.

Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.

Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.

There are several analytic functions, however they vary from platform to platform. Microsoft T-SQL has a set of them and Oracles PLSQL has another set of them. If you want to leanr more about the available ones, follow the links provided at the end of this post. In this exemple we will see how to use these two:

  • Lead() – Use this analytic function in a SELECT statement to compare values in the current row with values in a following row
  • Lag()   – Use this analytic function in a SELECT statement to compare values in the current row with values in a previous row

To use one of the following you need to develop the following sintaxe in your select statement:

LEAD or LAG ( scalar_expression [ ,offset ] , [ default ] )   OVER ( [ partition_by_clause ] order_by_clause )

Now lets see what these clauses are:

  • Scalar expression: The value you want to record (Atribute value)
  • Offset: The number of rows forward from the current row from which to obtain a value (Default is 1)
  • Default: The value to return when scalar_expression at offset is NULL
  • Partition_by_clause: Divides the result set produced by the FROM clause into partitions to which the function is applied (Groups) – Optional
  • Order_by_clause: Determines the order of the data before the function is applied. When partition_by_clause is specified, it determines the order of the data in each partition. The order_by_clause is required

Example: select lead(EMP_SALARY,1)  OVER(partition by EMP_DEPARTMENT order by EMP_SALARY desc )  from teste_table

The previous exemple would return the highest salary next to the following respecting the Department partition by clause, which means the highest salary for the employer department.

HANDS-ON

Lets develop a simple example so that you can see the benefits of analytic function. For exemple, lets imagine that we have a table, created with the following DDL script (PLSQL):

NOTE: This example will have a simple table in which we have an ID and a record date. The purpose of this exemple is to retrieve the previous and the next and the previous value of the DT_INVOICE atribute for each row, based on the INV_ID. My goal is just to show you the benefits of using analytic functions.

CREATE TABLE AF_INVOICE
(
    INV_ID NUMBER(24)
    ,DT_INVOICE DATE
)

The next step is to insert some values in the table:

insert into AF_INVOICE values(1, (select Sysdate from dual) );
insert into AF_INVOICE values(1, (select Sysdate -1 from dual) );
insert into AF_INVOICE values(1, (select Sysdate -2 from dual) );
insert into AF_INVOICE values(1, (select Sysdate -3 from dual) );
insert into AF_INVOICE values(1, (select Sysdate -4 from dual) );
insert into AF_INVOICE values(2, (select Sysdate from dual) );
insert into AF_INVOICE values(2, (select Sysdate -1 from dual) );

If you query your table using a simple select statement you will find the following records:

Capture

Now its time to start playing with analytical functions. Lets apply what we have learned about analytic functions in a select statement to get the next date of invoice for each record. The sintaxe for this select statement is as follows:

select INV_ID
,DT_INVOICE
,lead(DT_INVOICE, 1) over(partition by INV_ID order by DT_INVOICE asc ) NEXT
from AF_INVOICE;

The result of this query is the following:

Capture3

In our last exemple we are going to apply the leade and lag function in the same query to get the previous (Lag with ofset 1), the next date of invoice (Lead with offset 1) and the seconf next value for this date (Lead with offset 2). The result query would be:

 

select INV_ID
 ,DT_INVOICE
 ,lead(DT_INVOICE,1) over(partition by INV_ID order by DT_INVOICE asc ) NEXT
 ,lag(DT_INVOICE,1) over(partition by INV_ID order by DT_INVOICE asc ) PREVIOUS
 ,lead(DT_INVOICE,2) over(partition by INV_ID order by DT_INVOICE asc ) SECOND_NEXT
from AF_INVOICE;

 

After executing this query the result dataset would be the following:

Capture2

 

If you want to search more about this functions, check these two URLs:

 

Thank you,

Rui Machado

Anúncios

Deixe uma Resposta

Preencha os seus detalhes abaixo ou clique num ícone para iniciar sessão:

Logótipo da WordPress.com

Está a comentar usando a sua conta WordPress.com Terminar Sessão / Alterar )

Imagem do Twitter

Está a comentar usando a sua conta Twitter Terminar Sessão / Alterar )

Facebook photo

Está a comentar usando a sua conta Facebook Terminar Sessão / Alterar )

Google+ photo

Está a comentar usando a sua conta Google+ Terminar Sessão / Alterar )

Connecting to %s