Wednesday, March 28, 2012

killing the duplicates from a table using sql

hi, what is a good way to kill the duplicates from a table. when i say killing duplicated, i mean killing all the rows for the repeated row.

WorkTempID ItemNo Seq
100196 RTP-22 1
100197 RTP-22 2
100198 RTP-22 3
100199 RTP-22 3
100200 RTP-22 4
100201 RTP-22 4
100202 RTP-22 5
100203 RTP-22 5

********************************************************
see how Seq 3, 4 and 5 are repeated? so for the output i want.

WorkTempID ItemNoSeq
100196RTP-221
100197RTP-222
********************************************************

i DO NOT want this as the output. i already know how to achive this using DISTINCT keyword

WorkTempID ItemNoSeq
100196RTP-221
100197RTP-222
100198RTP-223
100200RTP-224
100203RTP-225How big is the table?|||not that big.. few dozen rows. i am basically getting the job done by reading through the whole table, doing a count based on seq. if the count is more than 1, i update that row with "delete" as the ItemNo. at the end just deleting everyting that has "delete" for ItemNo. gets the job done but i think there is a better way to do this.|||Assuming that you actually want to remove the duplicates from the underlying table...
If the table isn't that big then consider something like this:


Declare @.tblTemp table (WorkTemplID int, Item char(10), Seq int)

insert into @.tblTemp
select distinct * from Items

truncate Items

insert into Items
select * from @.tblTemp

Not the most elegant code but very easy to understand|||thanks for the reply but it does not give me what i need. remember, i not only need to kill the duplicates but also the orginal row that is duplicated. if seq 3 is repeated 5 times DISTINCT keyword will give me 1 row that has seq 3 in it. but i dont want to get ANY rows with seq 3.|||create table #t1 (c1 int)
insert into #t1 values (1)
insert into #t1 values (2)
insert into #t1 values (3)
insert into #t1 values (3)
insert into #t1 values (4)
insert into #t1 values (4)
insert into #t1 values (5)
insert into #t1 values (5)

select * from #t1
group by c1
having count(c1) < 2|||aaah. Ok you want something like this...


delete <table>
from <table> ORG
inner join
(select <col1>, <col2>,etc from <table> group by <col1>, <col2>,etc
having count(*) > 1) <some table alias STA> on
STA.<col1> = ORG.<col1> and etc (for all cols)
|||create table #t1 (c1 int)
insert into #t1 values (1)
insert into #t1 values (2)
insert into #t1 values (3)
insert into #t1 values (3)
insert into #t1 values (4)
insert into #t1 values (4)
insert into #t1 values (5)
insert into #t1 values (5)

select *
into #t2
from #t1
group by c1
having count(c1) < 2

select * from #t2|||ok ok, here's my last go at a perfect template ...

create table #t1 (c1 int)
insert into #t1 values (1)
insert into #t1 values (2)
insert into #t1 values (3)
insert into #t1 values (3)
insert into #t1 values (4)
insert into #t1 values (4)
insert into #t1 values (5)
insert into #t1 values (5)

delete from #t1
where c1 in
(
select c1
from #t1
group by c1
having count(c1) > 1
)|||ok, my last attempt at making the perfect template for this ...

create table #t1 (c1 int)
insert into #t1 values (1)
insert into #t1 values (2)
insert into #t1 values (3)
insert into #t1 values (3)
insert into #t1 values (4)
insert into #t1 values (4)
insert into #t1 values (5)
insert into #t1 values (5)

delete from #t1
where c1 in (
select c1
from #t1
group by c1
having count(c1) > 1
)

select * from #t1

Richard101|||Richard101 that only works if you've got a single unique column. The original example has no unique key columns...wouldn't have a problem if it did.

I just want to see you write out a few more templates :)|||ok, although my idea of a template is something that works, reduced to it's minimum, that you can build up.

right, using your data...

--
create table #t1 (c1 varchar(10), c2 varchar(10), c3 int)
insert into #t1 values ('100196', 'RTP-22', 1)
insert into #t1 values ('100197', 'RTP-22', 2)
insert into #t1 values ('100198', 'RTP-22', 3)
insert into #t1 values ('100199', 'RTP-22', 3)
insert into #t1 values ('100200', 'RTP-22', 4)
insert into #t1 values ('100201', 'RTP-22', 4)
insert into #t1 values ('100202', 'RTP-22', 5)
insert into #t1 values ('100203', 'RTP-22', 5)

delete from #t1
where c3 in
(
select c3
from #t1
group by c3
having count(c3) > 1
)

select * from #t1
--

Richard101|||Teehee. I was assuming that that a duplicates had to be c1 AND c2 AND c3. My fault. So how would you write that one Richard? ;)

No comments:

Post a Comment