A View Inside My Head

Jason's Random Thoughts of Interest

NAVIGATION - SEARCH

Using PIVOT and RANK Together

A friend of mine (name withheld, I didn't actually ask if I could blog this... ;-) asked for advice to what appears to be a simple problem until you try to implement it.  Consider the following somewhat normalized table:

AccountNum

Name

Email

0851774002 

John Doe  

jd@foo.com

0851774003   

John Doe   

jd@foo.com

0851774001   

John Doe   

jd@foo.com

0851774100   

John Doe   

jd@foo.com

0851693000   

Bob Public   

bob@bar.com

1138299000   

Jane Doe   

JaneD@baz.com

1353452000   

Jane Doe   

JaneD@baz.com

1028030000   

Jane Doe 

JaneD@baz.com

0851636000   

Jane Doe   

JaneD@baz.com


What he wanted was to collapse the data to one row per person, with a column for each Account Number.  That is, he needed to pivot the table.

When you pivot a table, unique values in the source column that you pivot on become new columns in the resulting table.  So, in this case, it would not make sense to pivot on the AccountNum column, because the result would be a new column named [0851774002], another one named [0851774003], etc.

Instead, an intermediate step needed to be performed that introduced a value that could be pivoted on.  This value needed to be consistent across the individual people (so that the first record for everybody contained the same value in this new column, the second record for everybody contained the same value, etc).

SQL Server 2005 introduced Ranking functions that provide the ability to rank a record within a partition.  In this case, we can use RANK() to assign a unique number for each record, and partition by the person's name (so that the RANK will reset for each person).  By prefixing some text to the rank number, we end up with something like:

SELECT Name, Email, AccountNum, 'AccountNum' + CAST(RANK() OVER ( PARTITION BY Name, Email ORDERBY AccountNum ) AS VARCHAR(10)) R FROM myTable Results: Name Email AccountNum R =========== ============== ========== ============ Bob Public bob@bar.com 0851693000 AccountNum1 Jane Doe JaneD@baz.com 0851636000 AccountNum1 Jane Doe JaneD@baz.com 1028030000 AccountNum2 Jane Doe JaneD@baz.com 1138299000 AccountNum3 Jane Doe JaneD@baz.com 1353452000 AccountNum4 John Doe jd@foo.com 0851774001 AccountNum1 John Doe jd@foo.com 0851774002 AccountNum2 John Doe jd@foo.com 0851774003 AccountNum3 John Doe jd@foo.com 0851774100 AccountNum4

The new column (R) is the concatenation of the literal string "AccountNum" and the string representation of the number that the RANK function returned.  But the bigger point is that now this column can be used for pivoting, and result in a series of new columns called [AccountNum1], [AccountNum2], [AccountNum3], etc.

Pivoting in SQL Server 2005 requires explicit declaration of values as a column list.  In this case, we can't just say "Pivot on the R column", but rather must say "Pivot on the R column, and make new columns only for these specific values".  This restriction is a little bit of a downside because we need knowledge of the values in the column.  Or, in this case, we need to know how many possible Account Numbers a person could possibly have so that we create enough columns in the result.

The entire solution is as follows:

SELECT * FROM ( SELECT Name, Email, AccountNum, 'AccountNum' + CAST ( RANK() OVER ( PARTITION BY Name, Email ORDER BY AccountNum ) AS VARCHAR(10)) R FROM myTable ) AS rankedSource PIVOT ( MAX (AccountNum) FOR R IN ( [AccountNum1], [AccountNum2], [AccountNum3], [AccountNum4], [AccountNum5], [AccountNum6], [AccountNum7], [AccountNum8], [AccountNum9], [AccountNum10] ) ) AS pivottable

 

And the results (showing only two of the AccountNum columns, even though there are actually 10)

Name Email AccountNum1 AccountNum2 ========== ============= =========== =========== Bob Public bob@bar.com 0851693000 NULL Jane Doe JaneD@baz.com 0851636000 1028030000 John Doe jd@foo.com 0851774001 0851774002