LINQ is one of the significant improvements of .Net and from the time it has been added till today I am always using it in all my projects (I forgot how I was using DB before!). But recently I have faced a weird issue. This is not a LINQ issue, but you need to keep it into consideration. Let’s say we have a table with below structure and data inside:
CREATE TABLE [dbo].[Test]( [Id] [INT] IDENTITY(1,1) NOT NULL, [NDate] [datetime] NOT NULL, CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] SET IDENTITY_INSERT [dbo].[Test] ON INSERT [dbo].[Test] ([Id], [NDate]) VALUES (1, '2011-08-24 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (2, '2011-08-24 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (3, '2011-08-24 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (4, '2011-08-24 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (5, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (6, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (7, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (8, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (9, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (10, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (11, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (12, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (13, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (14, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (15, '2011-08-21 00:00:00.000') INSERT [dbo].[Test] ([Id], [NDate]) VALUES (16, '2011-08-21 00:00:00.000') SET IDENTITY_INSERT [dbo].[Test] OFF
Now if you want to run below LINQ operation for pagination, you will be surprised:
using(var db = new TestDataContext()) { var page_1_items = db.Test.OrderByDescending(i=> i.NDate).Take(10).ToList(); var page_2_items = db.Test.OrderByDescending(i=> i.NDate).Skip(10).Take(10).ToList(); }
The page_1_items will be as following:
Id NDate 4 2011-08-24 00:00:00.000 3 2011-08-24 00:00:00.000 2 2011-08-24 00:00:00.000 1 2011-08-24 00:00:00.000 11 2011-08-21 00:00:00.000 10 2011-08-21 00:00:00.000 9 2011-08-21 00:00:00.000 8 2011-08-21 00:00:00.000 7 2011-08-21 00:00:00.000 6 2011-08-21 00:00:00.000
And page_2_items will be:
Id NDate 11 2011-08-21 00:00:00.000 12 2011-08-21 00:00:00.000 13 2011-08-21 00:00:00.000 14 2011-08-21 00:00:00.000 15 2011-08-21 00:00:00.000 16 2011-08-21 00:00:00.000
Did you noticed that some items are missing (item with ID 5)? and also item with ID equals to 11 will be repeated two times! Well, this was surprising me as well. The reason behind this is, generated query to load first page uses TOP X, but for other pages uses ROW_NUMBER(). This is for improving the performance, but this can make trouble for you as well. Below you can find generated queries:
/* For first page */ SELECT TOP 10 [Id],[NDate] FROM [Test] ORDER BY NDate DESC /* For second page */ SELECT [Id[,[NDate] FROM ( SELECT ROW_NUMBER() OVER (ORDER BY NDate DESC) AS RNumber, Id,NDate FROM [Test]) AS t WHERE RNumber BETWEEN 11 AND 20 /* Supposed to return same result as TOP 10, but it is different!! */ SELECT [Id[,[NDate] FROM ( SELECT ROW_NUMBER() OVER (ORDER BY NDate DESC) AS RNumber, Id,NDate FROM [Test]) AS t WHERE RNumber BETWEEN 1 AND 10
To solve the issue, you need to add secondary ordering statement to ORDER BY statement (in this case we need to add Id as well). So whenever you are sorting your items and performing pagination, consider secondary ordering as well. This is not a perfect solution, because there are lots of cases where you want to allow user to choose ordering column (like when listing items in grid view, etc.). If you are wondering why LINQ is behaving like this, there might be some performance considerations in doing like this, since using TOP X does not require any other operation and it is more likely to be used (usually you want to show most viewed or commented or recent items, which can be addressed by TOP X easily).