Inherent Performance Problem with Object to Linq Extension Methods

Linq and extension methods are wonderful features, since they allow extending objects without having to modify the original types. Extension methods are special types of static methods, but they are called as if they were instance methods. Most prominent examples include Linq to Object where there are a plethora of methods available for IEnumerable type.

However since many types implement IEnumerable interface this inadvertently creates a problem. For example an array of int is IEnumerable and we could call a Count() extension method on this array. However this approach has significant performance drawbacks in comparison to simply calling .Length on the array. The problem becomes obvious if we look at the implementation of the Count() extension method:

public static int Count<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull(nameof(source));
    }

    if (source is ICollection<TSource> collectionoft)
    {
        return collectionoft.Count;
    }

    if (source is IIListProvider<TSource> listProv)
    {
        return listProv.GetCount(onlyIfCheap: false);
    }

    if (source is ICollection collection)
    {
        return collection.Count;
    }

    int count = 0;
    using (IEnumerator<TSource> e = source.GetEnumerator())
    {
        checked
        {
            while (e.MoveNext())
            {
                 count++;
            }
        }
    }
    
    return count;
}

Because the extension method is applied on any IEnumerable it has to be very clever in how it computes the count: if it’s a collection then we can simply return the .Count otherwise we have to enumerate through all elements to determine the count. Just because of it’s size this method will not be inlined by the compiler and the cost of using .Count() vs .Length on an array type is going to be very high.

One solution to this problem is to implement Linq extensions for array types separately. In which case the implementation of .Count() becomes very simple and is likely to be inlined:

public static int Count<TSource>(this TSource [] items)
{
     return items.Length;
}

But then there are many other types where the Count could be easily calculated by calling .Count property, such as List<T>, IList<T>, ICollection<T>, ImmutableList<T> etc. The proper solution would be to implement all the Linq to Object extension methods for each of the types separately to ensure the best performance. Even in the case of List<T> vs IList<T> an implementation for the specific type would be slightly faster.

In other Linq to Object extension methods we could improve performance by substituting foreach with for which is normally slightly faster.

However this inevitably results in explosion of the codebase that has to be maintained. One plausible solution, while maintaining all the benefits of having an extension method for each type, is to write a template and then use t4 files to create a specific implementation for each type.

Taking the implementation challenges aside, integrating such a solution into an existing application is extremely trivial. We just need to add the assembly and provided that it uses the same namespace as the original Linq to Object extensions, recompiling the application would be enough for the compiler to pick up the “better” suited extension methods, resulting in immediate performance benefits.

I have started such project on GitHub

https://github.com/ebalynn/FasterLinq

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: