Показват се публикациите с етикет LINQ to SQL. Показване на всички публикации
Показват се публикациите с етикет LINQ to SQL. Показване на всички публикации

Vesko on Performance – LINQ to SQL

четвъртък, 23 юли 2009 г.

Intro
In general some of the biggest performance problems of the applications seen nowadays are found in the database or the data access layer. This combined with ORM framework such as LINQ to SQL which is new for the development team, which encapsulates all of the queries made against the DB (and “of course” not profiled), which is not researched enough, could lead to very “sexy” results. Here I will try to show you a few common mistakes which I have seen through my everyday work with different developers from all over the world as well as the one that our students here at National Academy for Software Development make. At the end of this article you should know how to avoid such mistakes (not only the one that I’m showing here). I will provide you with the toolset along with the practices which do the work for me every day, and will hopefully do the work for you, too!

The Problems
1. GetAll() and GetCount() methods – this is a classical pair of methods. As you know they can have different variation such as GetAllBy[Something]() and respectively GetCountBy[Something](). I have found a lot of developers do the following:

But where is the performance problem? When somebody calls GetAll() the code just retrieves all the SampleObjects from the DB. This seems to be OK and it actually is. But what about the GetCount() method? It firstly fetches all of the SampleObjects in memory and then counts them. As you know the expected way of solving this task is to use the COUNT function in T-SQL. Of course the idea behind this GetCount() is simple – the developer tried to reuse the query defined in the GetAll() method. One would say – “Hm… I thought that LINQ to SQL executes everything possible in the SQL server. Why here it doesn’t do this?!?” The problem is that when you build the expression (assigned to the result variable) it returns value from IQuerable. Up to now this expression is just an expression – it doesn’t execute anything against the SQL server. However on the next line we have a call of the ToList() method which idea is to return list of SampleObjects. Thus here on this line we have a DB query which fetches the requested list.
Here we should solve two problems. The performance one is obvious – we fetch everything from the DB just to count it, which will impact the memory usage as well as the network traffic. The second problem is that we want to stick up to the Don’t Repeat Yourself principle (DRY, also known as Single Point of Truth and Single Point of Maintenance) – we don’t want to have the query for the GetAll() and the GetCount() twice. In practice it will differ only by the Count() method. Keep in mind that we could have more complex queries with filters, skips, orders, etc. so this could be very bad if we don’t obey the DRY principle here. The above snippet tried to solve this second note, but in a wrong way.
The following code snippet shows sample solution for both of the mentioned problems:

It just uses third method, which is private for the class and returns the common part of the expression as IQueriable. The two public methods just appends the query with a ToList() and Count() calls.

2. SelectMethod(…) and SelectCountMethod(…) for paging – this sample is similar to the above, but again is very common mistake.

This implementation totally breaks one of the main reasons of using paging – loading only a part of the whole content. You have enough knowledge from the point 1 both for finding the problem and for solving it.

3. The N+1 query problem – a nightmare from the past. This is a very performance intensive problem which becomes a classical in the software development folklore. The idea behind this problem is very straightforward – imagine that we have a list of SampleObjects and for each list we want to get which is his owner – SampleObjectOwner. Thus one could write one query for retrieving all of the SampleObjects (let’s say that their count is N) and for each object, we ask which is its owner. This means that we will make 1 + N queries for retrieving the needed data.
I will show you an even worst situation, in which we have to find the first object (if such exists) which have particular SampleObjectOwner as an owner.

The mistake here is similar to the above two, but here it’s the worst situation - the method GetByOwner(…) needs as much quires as the number of the sampleObjects + 1. This means that in the future, while your DB grows the performance will become worst and worst. The problem’s fix is so ironically easy – just compare…

The above code produces only a single query.

General advice
We have seen some problems, which weren’t found during the development process. The question is how to prevent ourselves from such things?
Firstly – one should definitely not use technology or framework in production without a good research. As we just have seen the problems have very straightforward and logical explanation. We should confess that not reading enough is a problem of many developers.
Second – as I mentioned in my previous post – use class like DebuggerWritter for profiling every method. Normally such problems could be found very easily just with one testing. Yes – you will need to have at least a vague idea about what kind of SQL statement should be composed by LINQ to SQL but this is a normal requirement for each software developer.
Third – stick up to the techniques shown in the solutions above. They represent a good pattern for writing DAL logic with LINQ to SQL.

Conclusion
Hopefully, now you should know more about LINQ to SQL and how to avoid some of the problems, which are commonly seen in the developers’ code. It will be very helpful to add your comments to that topic showing problems that you have had/seen in your work.

Thanks,
Vesko Kolev

Vesko on Performance – New Series of Posts

вторник, 21 юли 2009 г.

Intro
These days I have been working on a very interesting project here at National Academy for Software Development. I had to make a performance audit of a project on which two or three development teams have been working on previously.

The situation:
- The used technologies were .NET Framework 3.5, C# 3.0, ASP.NET 3.5, SQL Server 2005, LINQ and LINQ to SQL.
- 2-tier architecture.
- The coding style – very different (different dev teams from different parts of the world).
- The used practices – hard to find after so many patches.
- The performance – poor.

I’m preparing a short series of posts about the problems, which I could be found in this software as well as in many others. As you will see below not all of the problems could be fixed. However most of them could be and you can think of them as something which you should definitely avoid. I’ll provide you with some set of advices how to find/avoid such problems. They will include tools and practices which will help you to provide more robust and reliable software. I will be very happy to “hear” your comments about the written as well as your personal experience with similar problems.

Let’s start our performance journey with the first topic!



Shit of the Day: DataPager Binds Twice
While I was observing the behavior of one of the pages which seems to have problems, I found something very “interesting”. Firstly, let me give you some necessary info about this particular situation. The page had a ListView control, DataPager associated with it and ObjectDataSource against which the ListView was bound. As you know, the ObjectDataSource provides two properties – SelectMethod and SelectCountMethod which retrieve respectively the items with which the list will be bound as well as their count. Of course the data source should know also which class provides these methods, so there is also property called TypeName. You know that for the paging functionality we should be able to “ask” how many records are there as well as “give me the i-th page”. This in theory means that we should make two DB queries to “ask” these two questions.

But what happened in practice? I found that the select method and the count method were called twice. My first suspicion was that the control was bound twice. I start inspecting the code line by line searching for such thing. After a few minutes of debugging I convinced myself that the list was bound only once. The next step was to review the DAL (Data Access Layer) logic for potential problems. Again - nothing which could cause this behavior…

In such situation the only thing which could be done is googling. I searched for a while and found the following article from a Microsoft’s site – “DataPager double binds when paging using QueryStringField” which was confirmed to be a bug. I’ve quickly checked whether QueryStringField is used in my situation and voila – the problem was dissected.

The short description about this bug is: “When you use the QueryStringField with a DataPager it causes the ListView to bind twice; once as if it was on the first page, and a second time for whatever page is passed via the QueryString (even if it is also the first page). Naturally, this is very bad for performance and completely unnecessary.”

Currently Microsoft hasn’t provided any workarounds for this problem. So just try to avoid using this particular combination in your projects.

General advice
When you are developing particular page/control always check the queries which are made against your database both for loading and storing data. When using ORM frameworks such as LINQ to SQL it is very important to know exactly what queries are executed against your DB. You could use frameworks like Log4Net/Microsoft Enterprise Library Logging Application Block along with DataContext.Log property. You should definitely use (if not already) classes like this for outputting LINQ to SQL queries to the debugger output. These very simple practices will save your live from similar shit of the day.

Stay tuned and have fun!

 
Vesko Kolev's Blog : IDeveloper -