Vesko on Performance – LINQ to SQL

четвъртък, 23 юли 2009 г.

Intro
In general some of the biggest performance problems of the applications seen nowadays are found in the database or the data access layer. This combined with ORM framework such as LINQ to SQL which is new for the development team, which encapsulates all of the queries made against the DB (and “of course” not profiled), which is not researched enough, could lead to very “sexy” results. Here I will try to show you a few common mistakes which I have seen through my everyday work with different developers from all over the world as well as the one that our students here at National Academy for Software Development make. At the end of this article you should know how to avoid such mistakes (not only the one that I’m showing here). I will provide you with the toolset along with the practices which do the work for me every day, and will hopefully do the work for you, too!

The Problems
1. GetAll() and GetCount() methods – this is a classical pair of methods. As you know they can have different variation such as GetAllBy[Something]() and respectively GetCountBy[Something](). I have found a lot of developers do the following:

But where is the performance problem? When somebody calls GetAll() the code just retrieves all the SampleObjects from the DB. This seems to be OK and it actually is. But what about the GetCount() method? It firstly fetches all of the SampleObjects in memory and then counts them. As you know the expected way of solving this task is to use the COUNT function in T-SQL. Of course the idea behind this GetCount() is simple – the developer tried to reuse the query defined in the GetAll() method. One would say – “Hm… I thought that LINQ to SQL executes everything possible in the SQL server. Why here it doesn’t do this?!?” The problem is that when you build the expression (assigned to the result variable) it returns value from IQuerable. Up to now this expression is just an expression – it doesn’t execute anything against the SQL server. However on the next line we have a call of the ToList() method which idea is to return list of SampleObjects. Thus here on this line we have a DB query which fetches the requested list.
Here we should solve two problems. The performance one is obvious – we fetch everything from the DB just to count it, which will impact the memory usage as well as the network traffic. The second problem is that we want to stick up to the Don’t Repeat Yourself principle (DRY, also known as Single Point of Truth and Single Point of Maintenance) – we don’t want to have the query for the GetAll() and the GetCount() twice. In practice it will differ only by the Count() method. Keep in mind that we could have more complex queries with filters, skips, orders, etc. so this could be very bad if we don’t obey the DRY principle here. The above snippet tried to solve this second note, but in a wrong way.
The following code snippet shows sample solution for both of the mentioned problems:

It just uses third method, which is private for the class and returns the common part of the expression as IQueriable. The two public methods just appends the query with a ToList() and Count() calls.

2. SelectMethod(…) and SelectCountMethod(…) for paging – this sample is similar to the above, but again is very common mistake.

This implementation totally breaks one of the main reasons of using paging – loading only a part of the whole content. You have enough knowledge from the point 1 both for finding the problem and for solving it.

3. The N+1 query problem – a nightmare from the past. This is a very performance intensive problem which becomes a classical in the software development folklore. The idea behind this problem is very straightforward – imagine that we have a list of SampleObjects and for each list we want to get which is his owner – SampleObjectOwner. Thus one could write one query for retrieving all of the SampleObjects (let’s say that their count is N) and for each object, we ask which is its owner. This means that we will make 1 + N queries for retrieving the needed data.
I will show you an even worst situation, in which we have to find the first object (if such exists) which have particular SampleObjectOwner as an owner.

The mistake here is similar to the above two, but here it’s the worst situation - the method GetByOwner(…) needs as much quires as the number of the sampleObjects + 1. This means that in the future, while your DB grows the performance will become worst and worst. The problem’s fix is so ironically easy – just compare…

The above code produces only a single query.

General advice
We have seen some problems, which weren’t found during the development process. The question is how to prevent ourselves from such things?
Firstly – one should definitely not use technology or framework in production without a good research. As we just have seen the problems have very straightforward and logical explanation. We should confess that not reading enough is a problem of many developers.
Second – as I mentioned in my previous post – use class like DebuggerWritter for profiling every method. Normally such problems could be found very easily just with one testing. Yes – you will need to have at least a vague idea about what kind of SQL statement should be composed by LINQ to SQL but this is a normal requirement for each software developer.
Third – stick up to the techniques shown in the solutions above. They represent a good pattern for writing DAL logic with LINQ to SQL.

Conclusion
Hopefully, now you should know more about LINQ to SQL and how to avoid some of the problems, which are commonly seen in the developers’ code. It will be very helpful to add your comments to that topic showing problems that you have had/seen in your work.

Thanks,
Vesko Kolev

 
Vesko Kolev's Blog : IDeveloper -