تبلیغات
یادداشتهای علی سالخورده حقیقی - Returning the Top X row for each group
 
یادداشتهای علی سالخورده حقیقی
We Provide Solutions for The Future Developement.
درباره وبلاگ


دریافت لوح تقدیر از سازمان ملل به عنوان مشاور فناوری اطلاعات سازمان در انتخابات برون مرزی سال 2004.

مدیر وبلاگ : علی سالخورده
نویسندگان
نظرسنجی
از کدامیک از آنتی ویروس های زیر استفاده میکنید و راضی هستید؟








آمار وبلاگ
  • کل بازدید :
  • بازدید امروز :
  • بازدید دیروز :
  • بازدید این ماه :
  • بازدید ماه قبل :
  • تعداد نویسندگان :
  • تعداد کل پست ها :
  • آخرین بازدید :
  • آخرین بروز رسانی :

In our table we have a list of runners, the time the course was completed in and the runner's age. Our task is to retrieve the fastest X runners in each age range.

CREATE TABLE #Runners
(
Runner integer
NOT NULL,
Time integer
NOT NULL,
Age integer
NOT NULL
)
INSERT INTO #Runners
SELECT 1 , 10 , 20 UNION ALL
SELECT 2 , 15 , 20 UNION ALL
SELECT 3 , 11 , 20 UNION ALL
SELECT 4 , 12 , 30 UNION ALL
SELECT 5 , 18 , 30 UNION ALL
SELECT 6 , 9 , 40 UNION ALL
SELECT 7 , 16 , 40 UNION ALL
SELECT 8 , 13 , 30

By far the simplest way is to use a combination of the ranking functions ( Specifically row_number () ) and CTE 's (Common Table Expressions). This may not always be the most efficient, but I'll come back to that point later.

Our first task is to ascertain the order that the runners completed the course in. If we were interested in only the overall, age range aside, order we could simply

Select * 
from #Runners
order by Time

and then by adding then TOP clause we can filter the Top X (assuming 2 here)

Select top(2) * 
from #Runners
order by Time

However, we do need the TOP X for each age category. As mentioned above we will be using the row_number() ranking function to help us do this. If we were to execute

select * ,row_number() over (order by Time ) as RowN 
from #Runners
order by Rown

Runner      Time        Age         RowN
----------- ----------- ----------- --------------------
6           9           40          1
1           10          20          2
3           11          20          3
4           12          30          4
8           13          30          5
2           15          20          6
7           16          40          7
5           18          30          8

We get effectively the same output as before but now with an added column, which is an incrementing integer in the order of time. If there were ties and we wanted each of them treated the same then we would use the DENSE_RANK function rather than ROW_NUMBER. In some systems if there is a tie, for example two runners finished in exactly the same time for first place, then there is no following position. That means no second place runner and we skip to third. We would then use the RANK function. In our example, however, we are going to ignore the possibility of ties.

An extension of the Over clause is "PARTITION BY", which will re-initialize the counter for each group specified with it. This is usable for all the Ranking functions and pre-existing aggregate functions. PARTITION BY is very similar to using a GROUP BY clause except now we are able to have a different clause for each ranking or aggregate function used. To see this is action

select * ,row_number() over (partition by Age order by Time ) as RowN 
from #Runners
order by Age,Rown

So now RowN is an incrementing counter ordered by time for each distinct value within the age column. This is the column that we will now need to filter upon to find our TOP X runners. In an ideal world we would be able to execute:

select * ,row_number() over (partition by Age order by Time ) as RowN 
from #Runners
where row_number() over (partition by Age order by Time ) <=2
order by Age,Rown

However we cannot, so the currently suggested method is to solve this by using a CTE. This will enable us to now filter on the RowN Column.

with cteRunners 
as
(
select * ,row_number() over (partition by Age order by Time ) as RowN
from #Runners
)
Select * from cteRunners
where RowN <=2
order by Age,Rown

Itzik Ben-Gan has proposed using a QUALIFY clause as detailed here to resolve this issue.

The caveat

Although this will work perfectly fine, performance over a large number of rows can be relatively poor as the engine will have to read every row in the table and for those rows, evaluate the row number function so that it can be filtered upon. This can be quite a waste of resources if we only require a small amount of the total number of rows. To prove the point, first we need to generate some test data,

IF OBJECT_ID('tempdb..#RunnersBig') IS NOT NULL drop table #RunnersBig 
go
Create Table #RunnersBig
(
RunnerId integer
identity ,
Time integer
not null,
Age integer
not null
)
go
insert into #runnersbig ( Time , Age )
select top 1000000 ABS ( checksum ( newid ()))% 1000 ,
ABS ( checksum ( newid ()))% 99
from sys . columns a cross join sys . columns b cross join sys . columns c
go
create index idxrunnersbig on #runnersbig ( age , time ) include ( runnerid )

This will generate a table with one million rows of randomized values and then create an index upon it. Our alternative approach will be to use the CROSS APPLY Operator and a tally table is used as our 'LEFT' table.

with cteN 
as
(
select number from master .. spt_values
where type = 'p' and number between 0 and 100
)
Select *
from cteN cross apply ( Select top ( 2 ) * from #RunnersBig where #RunnersBig . Age = cteN . number order by Time ) as runners
order by cteN . number , runners . Time

When compared to this filtered row_number query in Profiler:

with cteRunners 
as
(
select * , row_number () over ( partition by Age order by Time ) as RowN
from #RunnersBig
)
Select * from cteRunners
where RowN <= 2
order by Age , Rown
go

We will find that the filtered version executes in ~463ms whereas the CROSS APPLY a mere 1 ms YMMV.






نوع مطلب : پایگاه داده SQL، نکته ها و ترفندها، 
برچسب ها :
لینک های مرتبط :

       نظرات
1389/09/19
علی سالخورده
1396/05/17 12:59
Hello, I want to subscribe for this blog to obtain latest updates, thus where can i do it please help.
1396/05/15 16:28
I every time spent my half an hour to read this web site's articles
or reviews every day along with a cup of coffee.
1396/05/7 00:57
It's amazing to go to see this website and reading the views of all mates concerning this
article, while I am also keen of getting familiarity.
1396/05/1 07:43
magnificent issues altogether, you just won a new reader.
What could you recommend in regards to your
submit that you made a few days ago? Any certain?
1396/02/13 06:37
Hi there, You've done an incredible job. I'll definitely digg
it and personally recommend to my friends. I am confident
they will be benefited from this site.
1396/01/25 05:46
May I simply just say what a relief to find someone that genuinely
understands what they're discussing over the internet.

You certainly know how to bring an issue to light and make it important.
More and more people ought to read this and understand this side of your story.
I can't believe you aren't more popular since you definitely possess the gift.
1396/01/21 17:47
Hi there, I enjoy reading through your article.
I like to write a little comment to support you.
1396/01/21 14:42
Spot on with this write-up, I honestly believe that this amazing site needs much more attention. I'll
probably be back again to read through more, thanks for the info!
 
لبخندناراحتچشمک
نیشخندبغلسوال
قلبخجالتزبان
ماچتعجبعصبانی
عینکشیطانگریه
خندهقهقههخداحافظ
سبزقهرهورا
دستگلتفکر