A friend of mine (name withheld, I didn't actually ask if I could blog this... ;-) asked for advice to what appears to be a simple problem until you try to implement it. Consider the following somewhat normalized table:
AccountNum
|
Name
|
Email
|
0851774002
|
John Doe
|
jd@foo.com
|
0851774003
|
John Doe
|
jd@foo.com
|
0851774001
|
John Doe
|
jd@foo.com
|
0851774100
|
John Doe
|
jd@foo.com
|
0851693000
|
Bob Public
|
bob@bar.com
|
1138299000
|
Jane Doe
|
JaneD@baz.com
|
1353452000
|
Jane Doe
|
JaneD@baz.com
|
1028030000
|
Jane Doe
|
JaneD@baz.com
|
0851636000
|
Jane Doe
|
JaneD@baz.com
|
What he wanted was to collapse the data to one row per person, with a column for each Account Number. That is, he needed to pivot the table.
When you pivot a table, unique values in the source column that you pivot on become new columns in the resulting table. So, in this case, it would not make sense to pivot on the AccountNum column, because the result would be a new column named [0851774002], another one named [0851774003], etc.
Instead, an intermediate step needed to be performed that introduced a value that could be pivoted on. This value needed to be consistent across the individual people (so that the first record for everybody contained the same value in this new column, the second record for everybody contained the same value, etc).
SQL Server 2005 introduced Ranking functions that provide the ability to rank a record within a partition. In this case, we can use RANK() to assign a unique number for each record, and partition by the person's name (so that the RANK will reset for each person). By prefixing some text to the rank number, we end up with something like:
SELECT Name,
Email,
AccountNum,
'AccountNum' +
CAST(RANK() OVER ( PARTITION BY Name, Email ORDERBY AccountNum ) AS VARCHAR(10)) R
FROM myTable
Results:
Name Email AccountNum R
=========== ============== ========== ============
Bob Public bob@bar.com 0851693000 AccountNum1
Jane Doe JaneD@baz.com 0851636000 AccountNum1
Jane Doe JaneD@baz.com 1028030000 AccountNum2
Jane Doe JaneD@baz.com 1138299000 AccountNum3
Jane Doe JaneD@baz.com 1353452000 AccountNum4
John Doe jd@foo.com 0851774001 AccountNum1
John Doe jd@foo.com 0851774002 AccountNum2
John Doe jd@foo.com 0851774003 AccountNum3
John Doe jd@foo.com 0851774100 AccountNum4
The new column (R) is the concatenation of the literal string "AccountNum" and the string representation of the number that the RANK function returned. But the bigger point is that now this column can be used for pivoting, and result in a series of new columns called [AccountNum1], [AccountNum2], [AccountNum3], etc.
Pivoting in SQL Server 2005 requires explicit declaration of values as a column list. In this case, we can't just say "Pivot on the R column", but rather must say "Pivot on the R column, and make new columns only for these specific values". This restriction is a little bit of a downside because we need knowledge of the values in the column. Or, in this case, we need to know how many possible Account Numbers a person could possibly have so that we create enough columns in the result.
The entire solution is as follows:
SELECT *
FROM
( SELECT Name,
Email,
AccountNum,
'AccountNum'
+ CAST ( RANK() OVER ( PARTITION BY Name, Email
ORDER BY AccountNum ) AS VARCHAR(10)) R
FROM myTable
) AS rankedSource
PIVOT
(
MAX (AccountNum)
FOR R IN
( [AccountNum1], [AccountNum2], [AccountNum3],
[AccountNum4], [AccountNum5], [AccountNum6],
[AccountNum7], [AccountNum8], [AccountNum9],
[AccountNum10]
)
) AS pivottable
And the results (showing only two of the AccountNum columns, even though there are actually 10)
Name Email AccountNum1 AccountNum2
========== ============= =========== ===========
Bob Public bob@bar.com 0851693000 NULL
Jane Doe JaneD@baz.com 0851636000 1028030000
John Doe jd@foo.com 0851774001 0851774002
0e002873-9b02-4fd3-8bb6-c733142f4d2d|2|3.0|96d5b379-7e1d-4dac-a6ba-1e50db561b04