Removing Duplicate Rows In T-SQL

By Wayne Walter Berry - January 06, 2008

Been doing some interviewing lately and finding that people think they are really good at T-SQL (claiming they are a 6 out of 10), however almost none of them write their T-SQL by hand. If you are using the WYSIWYG query builder you are a 3 out of 10, or less. Here is one of my favorite interview problems that comes up in real life (my life) a lot.

Given a table like this:

CREATE T1 (ID int PRIMARY KEY, Name varchar(50))

That has duplicate rows in the name column, write a query to "clean" the duplicates leaving only on row per name. So if "Tom Smith" shows up in two rows (ID: 2, and 4), the query would delete either rows with ID 2 or 4 and leave the other one. My favorite answer is:

DELETE T1
WHERE NOT T1.ID IN (SELECT MIN(T1.ID) FROM T1 GROUP BY Name)

This includes an IN clause where is terrible for performance -- however the best solution I know. Most people try it with a cursor which is much harder.

Note: This is how you "clean" a table of duplicates before you add a unique index to the name column. A typical scenario when you have a lookup table that has some extra data and you are normalizing the database.

{6230289B-5BEE-409e-932A-2F01FA407A92}

Search This Blog

Project 31-A

Removing Duplicate Rows In T-SQL

Comments

Post a Comment

Popular posts from this blog

Simple WP7 Mango App for Background Tasks, Toast, and Tiles: Code Explanation

Error : /ScriptResource.axd : Invalid viewstate.

Running Multiple Threads on Windows Azure