AdvancedCase: Do the SQL Shuffle

A little puzzle arose when I was looking an anonymising client data. At one point I wondered if simply shuffling around some fairly innocuous text fields would be enough to separate some items from other parts of the data. The answer was no, but I still spent some time in the evening figuring out how I'd do it if I needed to - remind me to Get A Life at some point! :)

How to shuffle?

Suppose you have a simple table with an ID field, and a text description. You want to shuffle the 'des' data around so that they all still exist, but in different rows to where they started.

ID			DES
11			line 11
12			line 12
13			line 13
14			line 14
15			line 15
16			line 16
17			line 17
21			line 21
23			line 23

You can play with this using a temporary table (don't forget to DROP it later) :

-- create temp test data
CREATE TABLE #DataTable (
ID int,
Des varchar(10) );

WITH DATA AS (
   SELECT 11 AS ID, 'line 11' AS dez UNION ALL
   SELECT 12, 'line 12' UNION ALL
   SELECT 13, 'line 13' UNION ALL
   SELECT 14, 'line 14' UNION ALL
   SELECT 15, 'line 15' UNION ALL
   SELECT 16, 'line 16' UNION ALL
   SELECT 17, 'line 17' UNION ALL
   SELECT 21, 'line 21' UNION ALL
   SELECT 23, 'line 23' )
INSERT INTO #DataTable (ID, Des)
SELECT * FROM DATA

-- Insert Shuffle code here :)

SELECT * FROM #DataTable;
DROP TABLE #DataTable;

After finding that Rand() only executes once per SQL statement and is therefore useless for a random value per row..... eventually I found that Abs(CheckSum(NewID())) %90 is a handy way to get the kind of random values I was after. Deciding to try a multi-stage CTE approach, I got this far :

With d1 AS
(
SELECT -- shove a random value in
ID,
Des,
Abs(CheckSum(NewID())) % 90 as Randm
FROM #DataTable
),
d2 as
(
SELECT -- now get what we need with order sequences
ID as ID,
Row_number() OVER (ORDER BY ID) AS OrigRow,
Row_number() OVER (ORDER BY Randm) as NewRow,
Des as ChangedDes
FROM d1
)
SELECT * FROM d2 ORDER BY OrigRow

This gave me the following, and you can play with it at
http://sqlfiddle.com/#!3/fcba5/4/0

ID	ORIGROW	NEWROW	CHANGEDDES
11	1	8	line 11
12	2	1	line 12
13	3	7	line 13
14	4	2	line 14
15	5	5	line 15
16	6	6	line 16
17	7	9	line 17
21	8	3	line 21
23	9	4	line 23

After some deep thought (very deep for that time of night) before retiring to my chamber, I came up with the idea of joining one row to another from the CTEs, from old order to new random order, and linking that to the original row being updated. :

-- Shuffle
With d1 AS
(
SELECT
ID, Des, Abs(CheckSum(NewID())) % 90 as Randm
FROM #DataTable
),
d2 as
(
SELECT
ID as ID,
Row_number() OVER (ORDER BY ID) AS OrigRow,
Row_number() OVER (ORDER BY Randm) as NewRow,
Des as ChangedDes
FROM d1
)
UPDATE #DataTable
SET Des =
(SELECT swap.ChangedDes
FROM d2 swap
INNER JOIN d2 old ON
swap.NewRow = old.OrigRow
WHERE old.ID = #DataTable.ID
)
FROM #DataTable

That does the job nicely, and runs quickly enough too. See it in action at
http://sqlfiddle.com/#!3/fcba5/6/0

If you have any better ways to do it, I'm all ears. Comments welcomed, as always!

AdvancedCase

Wednesday 26 March 2014

Do the SQL Shuffle

How to shuffle?

No comments:

Post a Comment