Monday, February 20, 2012

Keep 1000 newest records... delete all the others

Ok... I'm trying to keep the newest 1000 records in my table,
and delete all the others.
There's got to be an easier/quicker way than this:
I create a derived table, sort it, looking at the 1000 newest
records... then ANOTHER derived table, sort it, looking at the oldest
of those 1000. Then using THAT output to delete all the records older
that record #1000 date.
This sample code keeps the 3 newest records... and deletes 2 older
records...
but the principle should be the same when I use it to keep 1000 newest
records...
and delete about 20000 older records... in my actual application.
DECLARE @.MyTable TABLE
(
MyDate DateTime
)
INSERT INTO @.MyTable VALUES('05-Jan-2005')
INSERT INTO @.MyTable VALUES('02-Jan-2005')
INSERT INTO @.MyTable VALUES('03-Jan-2005')
INSERT INTO @.MyTable VALUES('04-Jan-2005')
INSERT INTO @.MyTable VALUES('01-Jan-2005')
SELECT *
FROM @.MyTable
WHERE MyDate<
(
SELECT TOP 1 MyDate
FROM
(
SELECT TOP 3 MyDate
FROM @.MyTable
ORDER BY MyDate DESC
) AS t1
ORDER BY MyDate ASC
)--BEGIN PGP SIGNED MESSAGE--
Hash: SHA1
I get the same results w/ this:
select top 2 * from @.mytable order by mydate asc
Just the order is different.
To me, newest means the latest dates, not the earliest dates. E.g.:
5-Jan-2005 is newer than 1-Jan-2005
If you want the newest dates (my definition) you'd use something like
this:
select top 1000 <column list>
from <table name>
order by date_column desc
If you want the oldest date first:
select * from
(select top 1000 <column list>
from <table name>
order by date_column desc) as a
order by date_column
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
--BEGIN PGP SIGNATURE--
Version: PGP for Personal Privacy 5.0
Charset: noconv
iQA/ AwUBQlCRsYechKqOuFEgEQLfJACgkvk5KEhp6oZ0
1Cz2CpOmKDUlux0AmgNC
zAClof1qV9omRj72uLahmLPt
=Ug3Z
--END PGP SIGNATURE--
"A_Michigan_User" wrote:
> Ok... I'm trying to keep the newest 1000 records in my table,
> and delete all the others.
> There's got to be an easier/quicker way than this:
> I create a derived table, sort it, looking at the 1000 newest
> records... then ANOTHER derived table, sort it, looking at the oldest
> of those 1000. Then using THAT output to delete all the records older
> that record #1000 date.
> This sample code keeps the 3 newest records... and deletes 2 older
> records...
> but the principle should be the same when I use it to keep 1000 newest
> records...
> and delete about 20000 older records... in my actual application.
> DECLARE @.MyTable TABLE
> (
> MyDate DateTime
> )
> INSERT INTO @.MyTable VALUES('05-Jan-2005')
> INSERT INTO @.MyTable VALUES('02-Jan-2005')
> INSERT INTO @.MyTable VALUES('03-Jan-2005')
> INSERT INTO @.MyTable VALUES('04-Jan-2005')
> INSERT INTO @.MyTable VALUES('01-Jan-2005')
> SELECT *
> FROM @.MyTable
> WHERE MyDate<
> (
> SELECT TOP 1 MyDate
> FROM
> (
> SELECT TOP 3 MyDate
> FROM @.MyTable
> ORDER BY MyDate DESC
> ) AS t1
> ORDER BY MyDate ASC
> )
>|||So I guess you could
DELETE TableName WHERE YourKey NOT IN (SELECT TOP 1000 * FROM TableName
ORDER BY YourKey ASC (or) DESCc)
One curiosity of the TOP n is that it seems to behave differently during a
regular select clause than it does in the above!
Consider a simple table with an integer column. The values for the Column
are 1,2,3,3,3,4,5.
SELECT TOP 3 * returns 1,2,3
Whereas the above DELETE statement deletes 2 rows and you are left with
1,2,3,3,3
I must confess I don't know what is going on!!! - unless SQL is preventing a
Set overlap in which case depending on the distribution of your
Keys, you may or may not end up deleting everything but 1000 rows'
""A_Michigan_User"" <Michigan_RE_M0VE@.ameritech.net> wrote in message
news:ed6PD5KOFHA.164@.TK2MSFTNGP12.phx.gbl...
> Ok... I'm trying to keep the newest 1000 records in my table,
> and delete all the others.
> There's got to be an easier/quicker way than this:
> I create a derived table, sort it, looking at the 1000 newest
> records... then ANOTHER derived table, sort it, looking at the oldest
> of those 1000. Then using THAT output to delete all the records older
> that record #1000 date.
> This sample code keeps the 3 newest records... and deletes 2 older
> records...
> but the principle should be the same when I use it to keep 1000 newest
> records...
> and delete about 20000 older records... in my actual application.
> DECLARE @.MyTable TABLE
> (
> MyDate DateTime
> )
> INSERT INTO @.MyTable VALUES('05-Jan-2005')
> INSERT INTO @.MyTable VALUES('02-Jan-2005')
> INSERT INTO @.MyTable VALUES('03-Jan-2005')
> INSERT INTO @.MyTable VALUES('04-Jan-2005')
> INSERT INTO @.MyTable VALUES('01-Jan-2005')
> SELECT *
> FROM @.MyTable
> WHERE MyDate<
> (
> SELECT TOP 1 MyDate
> FROM
> (
> SELECT TOP 3 MyDate
> FROM @.MyTable
> ORDER BY MyDate DESC
> ) AS t1
> ORDER BY MyDate ASC
> )
>|||alex, i don't understand.. this does exacly what it's expected to do.
in your example, after the inner 'select top 3..' is executed, what you're
left with is something like this:
delete tablename where yourkey not in (1,2,3) --iow, in(4,5)
what you see as an anomaly is result of yourkey column not being unique.
dean
"Alex Potter" <apotter@.videotron.ca> wrote in message
news:e8KcVPMOFHA.1948@.TK2MSFTNGP14.phx.gbl...
> So I guess you could
> DELETE TableName WHERE YourKey NOT IN (SELECT TOP 1000 * FROM TableName
> ORDER BY YourKey ASC (or) DESCc)
> One curiosity of the TOP n is that it seems to behave differently during
> a
> regular select clause than it does in the above!
> Consider a simple table with an integer column. The values for the Column
> are 1,2,3,3,3,4,5.
> SELECT TOP 3 * returns 1,2,3
> Whereas the above DELETE statement deletes 2 rows and you are left with
> 1,2,3,3,3
> I must confess I don't know what is going on!!! - unless SQL is preventing
> a
> Set overlap in which case depending on the distribution of your
> Keys, you may or may not end up deleting everything but 1000 rows'
>
> ""A_Michigan_User"" <Michigan_RE_M0VE@.ameritech.net> wrote in message
> news:ed6PD5KOFHA.164@.TK2MSFTNGP12.phx.gbl...
>|||Why 1000 rows? Might this requirement not change if the volume of
business goes up or down? A more logical method (and much more
efficient) would seem to be to base this on a date rather than a fixed
value:
DELETE FROM YourTable
WHERE dt <= DATEADD(DAY,-28,CURRENT_TIMESTAMP)
You could schedule this as a regular job if you need to.
DISPLAYING only 1000 rows is a different matter. You don't need to
delete rows just because you don't want to display them.
David Portas
SQL Server MVP
--|||> Why 1000 rows? Might this requirement not change if the volume of
> business goes up or down? A more logical method (and much more
> efficient) would seem to be to base this on a date rather than a fixed
I need to keep the 1000 newest. (That's *MORE* than we'll really need. But
we *DO* want
to keep "a certain number"... not... "delete based by dates".)

> DISPLAYING only 1000 rows is a different matter. You don't need to
> delete rows just because you don't want to display them.
I need to "delete". (My example was written to "display"... just so I
could see what I was about to delete... before
actually deleting.)|||I also tried the "NOT IN" method... and it seemed to work... but I was
assuming that might be slower to execute.
I "think" that it's doing this... but it might not be:

> Get 1000 newest records... sort them... but return the 20000 older
> records instead.
> Make a massive list of 20000 comma separated dates. (Ugh)
> Delete them... 1-by-1.
> If the same date appears 5000 times... search for it each time... and
> delete it each time. (Double ugh.)
Maybe MS-SQL is smart enough to know NOT to do it the hard way... and is
optimizing it automatically.
I really don't know.
What I'd *REALLY* like to do... is find the date of record number 1001
(sorted by date, newest first)... and
just do a single, quick delete... based on that 1 date.
DELETE FROM myTable
WHERE MyDate<@.thatDate
Done.
But what's the easiest/quickest way to find the date of the 1001st record
(sorted by date, newest first)?
(Hopefully with a single SQL statement.)
Thanks for everyone's good ideas.

> DELETE TableName WHERE YourKey NOT IN (SELECT TOP 1000 * FROM TableName
> ORDER BY YourKey ASC (or) DESCc)
> One curiosity of the TOP n is that it seems to behave differently during
> a
> regular select clause than it does in the above!
> Consider a simple table with an integer column. The values for the Column
> are 1,2,3,3,3,4,5.
> SELECT TOP 3 * returns 1,2,3
> Whereas the above DELETE statement deletes 2 rows and you are left with
> 1,2,3,3,3
> I must confess I don't know what is going on!!! - unless SQL is preventing
> a
> Set overlap in which case depending on the distribution of your
> Keys, you may or may not end up deleting everything but 1000 rows'
>
> ""A_Michigan_User"" <Michigan_RE_M0VE@.ameritech.net> wrote in message
> news:ed6PD5KOFHA.164@.TK2MSFTNGP12.phx.gbl...
>|||""A_Michigan_User"" <Michigan_RE_M0VE@.ameritech.net> wrote in message
news:Oyd1hxVOFHA.1476@.TK2MSFTNGP09.phx.gbl...
>I also tried the "NOT IN" method... and it seemed to work... but I was
>assuming that might be slower to execute.
> I "think" that it's doing this... but it might not be:
>
> Maybe MS-SQL is smart enough to know NOT to do it the hard way... and is
> optimizing it automatically.
> I really don't know.
> What I'd *REALLY* like to do... is find the date of record number 1001
> (sorted by date, newest first)... and
> just do a single, quick delete... based on that 1 date.
> DELETE FROM myTable
> WHERE MyDate<@.thatDate
> Done.
> But what's the easiest/quickest way to find the date of the 1001st record
> (sorted by date, newest first)?
> (Hopefully with a single SQL statement.)
> Thanks for everyone's good ideas.
something like this?
delete myTable
where myDate<(select min(myDate) from (select top 1001 myDate from myTable
order by myDate desc))
dean|||Dean, I couldn't get your example to work... unless I changed that SELECT
TOP x statement
to a derived table. Then it seems to do exactly what I need. Thanks.
DECLARE @.MyTable TABLE
(
MyDate DateTime
)
INSERT INTO @.MyTable VALUES('05-Jan-2005')
INSERT INTO @.MyTable VALUES('02-Jan-2005')
INSERT INTO @.MyTable VALUES('03-Jan-2005')
INSERT INTO @.MyTable VALUES('04-Jan-2005')
INSERT INTO @.MyTable VALUES('06-Jan-2005')
INSERT INTO @.MyTable VALUES('07-Jan-2005')
INSERT INTO @.MyTable VALUES('08-Jan-2005')
INSERT INTO @.MyTable VALUES('09-Jan-2005')
SELECT *
FROM @.MyTable
WHERE MyDate<
(
SELECT MIN(MyDate)
FROM
(
SELECT TOP 3 MyDate
FROM @.MyTable
ORDER BY MyDate DESC
) AS t
)
"Dean" <dvitner@.nospam.gmail.com> wrote in message
news:%23CfaC6VOFHA.2704@.TK2MSFTNGP15.phx.gbl...
> ""A_Michigan_User"" <Michigan_RE_M0VE@.ameritech.net> wrote in message
> news:Oyd1hxVOFHA.1476@.TK2MSFTNGP09.phx.gbl...
>
> something like this?
> delete myTable
> where myDate<(select min(myDate) from (select top 1001 myDate from myTable
> order by myDate desc))
> dean
>|||> I get the same results w/ this:
> select top 2 * from @.mytable order by mydate asc
> Just the order is different.
> To me, newest means the latest dates, not the earliest dates. E.g.:
> 5-Jan-2005 is newer than 1-Jan-2005
I agree. (But I've never heard anyone say "the socks I bought back in 1998
are newer than the
ones I just bought in 2005.)
Newer = more recently = 05-Jan-2005 = MAX(MyDate) = DESC sorted
Older = long ago = 01-Jan=1998 = MIN(MyDate) = ASC sorted
I hope I've got that correct. (But I can't find anything in my example...
or anyone's reply... that says otherwise.)
> If you want the newest dates (my definition) you'd use something like
> this:
> select top 1000 <column list>
> from <table name>
> order by date_column desc
> If you want the oldest date first:
> select * from
> (select top 1000 <column list>
> from <table name>
> order by date_column desc) as a
> order by date_column
> --
> MGFoster:::mgf00 <at> earthlink <decimal-point> net
> Oakland, CA (USA)
> --BEGIN PGP SIGNATURE--
> Version: PGP for Personal Privacy 5.0
> Charset: noconv
> iQA/ AwUBQlCRsYechKqOuFEgEQLfJACgkvk5KEhp6oZ0
1Cz2CpOmKDUlux0AmgNC
> zAClof1qV9omRj72uLahmLPt
> =Ug3Z
> --END PGP SIGNATURE--
>
> "A_Michigan_User" wrote:

No comments:

Post a Comment