Wednesday, March 28, 2012

Kind of simple join question. Not sure if its that

Hello,

the query

Select Year, ID, Count(*)
From
dbo.Table
Where ID like '22%'

gives me this

Year ID Count
2000 222 2
2000 223 1
2001 222 4

But I need this:

Year ID Count
2000 222 2
2000 223 1
2001 222 4
2001 223 0

In Words, any ID ever listed in at least one Year should be listed in every Year. What would be the most efficient way for this?

Thank you!!cross join of distinct years to distinct ids, then left outer join back to dataselect y.year
, i.id
, count(d.id)
from (
select distinct year
from dbo.Table
where ID like '22%'
) as y
cross
join (
select distinct ID
from dbo.Table
where ID like '22%'
) as i
left outer
join dbo.Table as d
on d.year = y.year
and d.ID = i.ID
group
by y.year
, i.id|||Well, that's a quick answer :) Thank you|||Just one more thing:

The Select ... Where plz like ... takes lot of time (about 2 minutes) because the where clause is more complex.

So, to get a result, it takes 2 minutes. If i want it ordered the way I mentioned it will take 2 * (number columns) , and there are about 5 of them.

Isn't there a way to run the "Select statement" once and then do some "sql ordering"?|||Isn't there a way to run the "Select statement" once and then do some "sql ordering"?yes, it's called the ORDER BY clause

:)

as for your execution times, you may need to declare some indexes on your tables|||So I have to rerun the Select on the dbo.Table for every column? No way to do the Distincts and joins on the "Result"?|||i'm sorry, could you repeat the question please?

you said "If i want it ordered the way I mentioned it will take 2 * (number columns) , and there are about 5 of them."

first of all, you didn't mention anything

secondly, how many ways do you want to sort the results of the query i gave you?|||I'd start with:CREATE TABLE #summary (
[year] INT
, [id] INT
, [c] INT
)

INSERT INTO #summary (
[year], [id], [c]
) SELECT [year], [id], Count(*)
FROM dbo.Table
WHERE [id] LIKE '22%'

SELECT y.[year], i.[id], c.c
FROM SELECT DISTINCT [year]
FROM #summary) AS y
CROSS JOIN (SELECT DISTINCT [id]
FROM #summary) AS i
LEFT JOIN #summary AS c
ON (c.[year] = y.[year]
AND c.[id] = i.[id])
ORDER BY y.[year], i.[id]The temporary table gets the data of interest in just one pass instead of multiple passes, as well as reducing the overall row count for further processing by r937's suggested JOIN.

If the [id] column is an INT, then no index will help performance. If [id] is a CHAR column, then an index might help a lot.

-PatP|||Thank you Pat, the idea of a temporary table is exactly what I was looking for to avoid multiple passes.
I thought one could have some kind of a virtual temporary table, which does not have to be created before. I mean, if I do a select and get the result, this result is kind of a table, isn't it?
What I tried first was doing a "With summary as (Select ..." and later do the distincts and joins (what I called ordering before, sorry) on that summary. I hoped, that by this way, there will only be one pass. But that was a wrong assumption. Seems using the name of a with clause in another from clause is just kind of a link to that select in the with and not the result.
I hope you got me. First, I'm not very good in english, second, I'm new to databases.

So, till next time ;-)|||nice one with the temp table, pat

how did you know the square brackets would not cause a syntax error?

this is the SQL forum, not the SQL Server forum!! ;)|||OMG, a temp table is a virtual table. Sorry, still have to learn a lot :)|||I thought one could have some kind of a virtual temporary table, which does not have to be created before.
The implementation of "WITH" clauses (CTEs) may differ from one RDBMS to an other. Anyhow, it should be equivalent to a temp table (in terms of performance) if that's more performant than a nested table expression (as in r937's solution).
At least, that's what a "good" optimizer should find out by itself.
So the following should be at least as performant as Pat Phelan's solution: WITH summary (year, id, c) AS
(SELECT year, id, COUNT(*)
FROM dbo.Table
WHERE id LIKE '22%'),
y (year) AS (SELECT DISTINCT year FROM summary),
i (id) AS (SELECT DISTINCT id FROM summary)
SELECT y.year, i.id, summary.c
FROM y CROSS JOIN i LEFT OUTER JOIN summary USING (year,id)
ORDER BY y.year, i.id|||i'm still wondering what the other 5 columns are, and how the order needs to be varied from the way it was first mentioned ;)|||how did you know the square brackets would not cause a syntax error?The use of dbo.Table told me that it had to be either Microsoft or Sybase. When Silas reported timings for the process, I knew that your virtual table syntax had worked, which meant that Silas had to be using Microsoft SQL. Knowing that allowed me to continue down the engine-specific path a bit further than you had initially gone.
this is the SQL forum, not the SQL Server forum!! ;)Correct!i'm still wondering what the other 5 columns are, and how the order needs to be varied from the way it was first mentioned
;) Operative word being "think", I think that Silas only had one order in mind, but that wasn't the order provided by the CROSS JOIN query. By using the temp table and applying an ORDER BY to the result set, I think Silas has got what they wanted.
OMG, a temp table is a virtual table. Sorry, still have to learn a lot A temp table isn't exactly a virtual table, but it is pretty darned close. Especially when used inside a stored procedure.

-PatP|||i'm still wondering what the other 5 columns are, and how the order needs to be varied from the way it was first mentioned ;)

What I try is to mimic the functionality of a pivot table. The columns are dimensions.
Example:

Year Location Type Size ... Count(*) AVG(Price) AVG(Visits) ...
2000 123 A 10-15 5 9000 14
2000 123 A 15-20 76 5040 99
2000 123 A 40-80 12 540 7
2000 123 B 10-15 25 330 22
2000 123 B 15-20 12 3330 1
2000 123 B 40-80 NULL NULL NULL
...

It's a bit complex. But temp table is a good thing for drill-down and drill-up.
What I called ordering before has nothing to do whith the order by command. What I meant was the cross join, so the result in excel is shown in a correct way. That means even there is no object e.g. with size 40-80 I want the row to be shown. I need that for the Charts.
The With clause doesn't work very well in this case (for sql server), because it creates multiple passes. Not just one.

I think Silas has got what they wanted.
Yes we do :-)

No comments:

Post a Comment