Entity Framework Performance: 3 Things You Must Consider

I hear it all the time: Entity Framework is slow, Entity Framework can't handle this kind of volume, We need to rip out Entity Framework for regular SQL. In some cases this is necessary, but let me demonstrate a few easy ways to make sure you're eeking the most performance out of your Entity Framework queries.

Introduction - Entity Framework and Performance

Entity Framework has been the object relational mapper (ORM) of choice for .NET applications for a long time now. It allows developers a simple way to persist their model objects within application code without the need for managing SQL every single time for simple Create, Read, Update, and Delete (CRUD) operations. Like any abstraction, there's often a performance penalty to pay, and Entity Framework is no exception.

My goal with this article is to show that the performance penalty for using Entity Framework for some of the most common SQL operations isn't all that much, and there are a handful of techniques .NET developers can use to get very good to great performance out of their Entity Framework applications.

I will demonstrate how to easily fix some common Entity Framework pitfalls that result in bad performance. These fixes are easy to implement and will help ensure we are writing scalable data-access code in our .NET applications. Here's a quick overview of the performance optimization techniques we can apply.

Know when Entity Framework executes SQL queries. Let the database do what it does best.
Avoid lazy loading and executing N+1 queries. Round trip times to your database management system (DBMS) can really add up quickly.
Query only what you need. Entity Framework makes it very easy to query data, and LOTS of it. Let's make sure we're not asking our DBMS to do too much.

Setting Up the Entity Framework Core App

I will be using Entity Framework Core 2.2 inside of a simple .NET Core 2.2 console application to demonstrate the performance fixes. I've published the code to a repo on my GitHub account. Despite being Entity Framework Core 2.2 running on netcoreapp2.2, these examples are still relavant to both newer versions of Entity Framework Core (3.x) and the older Entity Framework 6.x. The syntax for the operations I'm going to demonstrate are all very similar between the different versions, and the concepts remain the same.

Entity Framework Core, in general, performs better than the older Entity Framework 6.x. Check out my blog post here to see benchmarks that demonstrate the difference in performance from Entity Framework 6.x to Entity Framework Core 3.x.

I will be using an instance of SQL Server 2017 Express for my database. The database I will use to demonstrate these topics is called BookDB and it consists of three tables: books, authors, and book copies. An author can write any number of books. A book has one author (many to 1); one author can write many books. A book can have any number of copies. (1 to many).

Seeding the Data

For seeding the data, I wrote a SQL script to seed 1,000 authors, 10,000 books, and 1,000,000 book copies. With this volume of data we should easily be able to start to get an idea of bad performance from poorly written queries.

Inside the console application I will talk to my BooksDB in typical Entity Framework fashion using the Db class that inherits from Entity Framework's DbContext. For convenience, I've included my Book, Author, and Copy model classes in the same file.

I've enabled lazy loading, which I will demonstrate soon, by including a call to .UseLazyLoadingProxies() in the options builder, and marking the nagivation properties in my entites classes as virtual. Lazy loading has been featured in past Entity Framework versions, but it was first introduced to Entity Framework Core with version 2.1.

using Microsoft.EntityFrameworkCore;
using System.Collections.Generic;

namespace EFPerformance.Database
{
    public class Db : DbContext
    {
        public DbSet Books { get; set; }
        public DbSet Authors { get; set; }
        public DbSet Copy { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) => optionsBuilder
                .UseLazyLoadingProxies()
                .UseSqlServer(@"Server=.\SQLEXPRESS;Database=BooksDB;Trusted_Connection=True;");

    }

    public class Book {
        public int BookId { get; set; }
        public string Name { get; set; }

        public int AuthorId { get; set; }
        public virtual Author Author { get; set; }

        public virtual IList Copies { get; set; }
    }

    public class Author
    {
        public int AuthorId { get; set; }
        public string FullName { get; set; }
    }

    public class Copy
    {
        public int CopyId { get; set; }
        public virtual Book Book { get; set; }
        public decimal Price { get; set; }
    }
}

Know When Entity Framework Executes SQL Queries

One of the easiest "mistakes" to make, especially for newcomers to Entity Framework or ORMs in general, is to prematurely execute a SQL query, and do a large filter or sort to a huge number of objects in memory. Almost all DBMS are very good at sorting and filtering provided there's good index usage, so our DBMS can often handle these operations much more efficiently than our application code can with the added bonus we can reduce our network overhead by not having to transfer a large amount of records over the wire.

The key to avoiding this performance pitfall is to pay close attention to where Entity Framework calls the database (most often by calling .ToList()). Ideally, we want to build up our query as an IQueryable<EntityType> and then Entity Framework will translate our IQueryable into SQL that's compatible with our DBMS provider of choice.

The Wrong Way

In the example below, we will call s.ToList() directly on Books. This will retrieve every book our database has. The responsibility of sorting or filtering this data now lies within our application code. If you have enough Books in your database, this could place a tremendous burden on your .NET application because it's not optimized to handle data in the way a proper DBMS can.

var books = db.Books.ToList();
var matchingBooks = books
    .Where(b => b.Name == "EF Core for Smarties");

The first line executes a query against BooksDB equivalent to SELECT * FROM Books which in our case loads all 10,000 books into our application (ouch!). We then filter the list of books to find books named EF Core for Smarties in a LINQ query.

Luckily we're only loading 10,000 (very simple) book records into memory here. Imagine deploying to production where there could be millions upon millions of books. It wouldn't take long before we realized we have a problem on our hands. I timed the execution of this code 13 times (the first two or three calls require additional .NET and EF overhead, so we have 10 good results) in my console application, and here are the results.

Finished in 1654ms
Finished in 122ms
Finished in 114ms
Finished in 101ms
Finished in 104ms
Finished in 97ms
Finished in 112ms
Finished in 117ms
Finished in 98ms
Finished in 100ms
Finished in 105ms
Finished in 98ms
Finished in 112ms

Luckily, my .NET Core compiled console application can still handle 10,000 books rather quickly at about 100 milliseconds per try, but let's see how easy it is to make this perform much faster.

The Right Way

In this example we instead do the filtering prior to calling .ToList() using LINQ's .Where(). Doing it this way will build up our IQueryable to include the LINQ .Where(), so Entity Framework can compile the query into SQL that includes the SQL WHERE clause. Here's what the C# code looks like.

var books = db.Books
    .Where(b => b.Name == "EF Core for Smarties")
    .ToList();

Entity Framework Core will translate the LINQ expression and execute the following SQL query for us.

SELECT [b].[BookId], [b].[AuthorId], [b].[Name]
FROM [Books] AS [b]
WHERE [b].[Name] = N'EF Core for Smarties'

Entity Framework has included the WHERE clause for the book's name. Let's see how this performs.

Finished in 70ms
Finished in 6ms
Finished in 3ms
Finished in 3ms
Finished in 3ms
Finished in 2ms
Finished in 2ms
Finished in 3ms
Finished in 2ms
Finished in 2ms
Finished in 4ms
Finished in 3ms
Finished in 3ms

While we were performing okay at 100 milliseconds per run earlier, we're now performing much faster at about 2-3 milliseconds per run! That's very nearly a two orders of magnitude improvement. For basic filtering and sorting, you almost always should let your DBMS do it's thing by building up your IQueryable with calls to .Where() for filtering and .OrderBy() for sorting before calling .ToList(), which will ultimately compile your query and send it off to your DBMS.

Optimizing Further

We could further increase the performance of this code by indexing the Name column within the Books table, and using .AsNoTracking() if we're using the query results in a read-only manner.

Avoid Lazy Loading and N+1 Queries!

Another mistake that's simple to overlook, even for experienced developers, is allowing excessive repetitive queries to execute via lazy loading. Microsoft defines lazy loading in Entity Framework as "the process whereby an entity or collection of entities is automatically loaded from the database the first time that a property referring to the entity/entities is accessed."

In our Book object we have a reference to Author, but when I query for a Book, and then I try to do something like var bookAuthorName = book.Author.FullName, Entity Framework will only load the Book object at first, but when we access the FullName of the Author property, Entity Framework will have to query the database for the author record tied to the book record we're using. Two queries get executed against the database, one to retrieve the book record, and the lazy load when accessing the Author property.

Consider we're dealing with a lot of books, and we want information about the book's author. Provided there's lot of unique author's and we have a large list of books, this could result in a lot of repetitive queries and round trips to the database. This will execute N+1 queries because we query a list of a books (1) and we have at most one additional query for each unique author (N).

The Wrong Way

In this example we'll create a simple query for a list of books, and then loop through each book to do something with the Author's full name.

var books = db.Books
    .OrderBy(b => b.BookId)
    .Take(1000)
    .ToList();

foreach (var book in books)
{
    var doSomethingWithName = book.Author.FullName;
}

This looks harmless right? Let's see how long it takes to execute this code.

Lazy loading: Finished in 2284ms
Lazy loading: Finished in 443ms
Lazy loading: Finished in 377ms
Lazy loading: Finished in 380ms
Lazy loading: Finished in 385ms
Lazy loading: Finished in 379ms
Lazy loading: Finished in 391ms
Lazy loading: Finished in 424ms
Lazy loading: Finished in 398ms
Lazy loading: Finished in 383ms
Lazy loading: Finished in 417ms
Lazy loading: Finished in 409ms
Lazy loading: Finished in 427ms

Yikes. Each run is taking around 400 milliseconds which doesn't sound all that bad. But do consider I'm running this test where my application is running on the same machine as my DBMS, so my round trip time is going to be much lower than a proper production setup where the application server and database server are going to be geographically further apart.

Whilst capturing queries on my SQL Server Profiler, which I would recommend using alongside Entity Framework in order to see what your queries look like, I counted 8048 queries ran against BooksDB as a result of the 13 tests. That's nearly 613 queries executed against the database each time this code is run! The following SQL made up the vast majority of the queries.

SELECT [e].[AuthorId], [e].[FullName]
FROM [Authors] AS [e]
WHERE [e].[AuthorId] = @__get_Item_0

It's a very simple query that the DBMS is very good at handling for sure, but the round trip times can certainly be a performance killer for your .NET application.

The Right Way

Luckily for us, Entity Framework Core provides an easy tool for dealing with situations like this. In this situation, it would be far more efficient to eager load and grab all the Author data up front in one single round trip as opposed to 613 as we saw in our lazy loading.

In our code, we need to instruct Entity Framework to eager load each books' Author property by using .Include. Here's an example of this technique below.

var books = db.Books
    .Include(b => b.Author)
    .OrderBy(b => b.BookId)
    .Take(1000)
    .ToList();

foreach (var book in books)
{
    var doSomethingWithName = book.Author.FullName;
}

That wasn't so bad; it's only one line of code. Let's time this several times using our new approach.

Eager loading: Finished in 1782ms
Eager loading: Finished in 56ms
Eager loading: Finished in 28ms
Eager loading: Finished in 23ms
Eager loading: Finished in 29ms
Eager loading: Finished in 26ms
Eager loading: Finished in 22ms
Eager loading: Finished in 25ms
Eager loading: Finished in 32ms
Eager loading: Finished in 23ms
Eager loading: Finished in 32ms
Eager loading: Finished in 26ms
Eager loading: Finished in 22ms

Aah, much better. 25 or so milliseconds is a big improvement in performance from 400 milliseconds. Taking a glance at my SQL Server Profiler, I see that it traced 13 queries for our 13 tests instead of the 8048 from lazy loading previously. Very nice. Below is an output of the SQL Entity Framework generated for us. Notice Entity Framework performed a SQL JOIN for us in the eager loading generated SQL.

SELECT TOP(@__p_0) [b].[BookId], [b].[AuthorId], [b].[Name], [b.Author].[AuthorId], [b.Author].[FullName]
FROM [Books] AS [b]
INNER JOIN [Authors] AS [b.Author] ON [b].[AuthorId] = [b.Author].[AuthorId]
ORDER BY [b].[BookId]

This SQL will perform just fine, but we can make this even more performant.

Query Only What You Need

Entity Framework makes it very, very easy for developers to query data from the database. It's almost too easy to write LINQ queries like we did in the above examples. All we were really doing was retrieving the FullName property of each book's author.

We can take our performance and efficiency to the next level by making use of a projection using LINQ's .Select(). This works very much like the SELECT clause in SQL. We use .Select() to grab only the properties we need. By doing this we're ensuring we aren't wasting resources loading a bunch of properties or columns we're not going to use.

In the examples above, we were essentially issuing SELECT * FROM TableName which is hardly ideal for almost any DBMS. SELECT *'s are typically not optimal due to:

Unless you need absolutely every column from every record, you're increasing the payload the DBMS has to deliver across the network to your application server. It's potentially wasteful.
Index usage. Your DBMS may not have an index (and it probably shouldn't in a lot of cases) that covers all the columns in your query, so the DBMS will have to work harder.

In addition, if we grab only primitive type properties (such as int, string, etc.), we're no longer dealing with the Entity Framework change tracker, since there's essentially no entities to keep track of, so we get to avoid a lot of overhead (without needing to use .AsNoTracking()). Here's some sample code that accomplishes the same thing we did earlier, only we're using the LINQ .Select() to project the Books query into a list of anonymous objects whose only property is the author's full name. Data-transfer objects (DTOs) or MVC ViewModel-esque classes work great in this scenario as well if an anonymous object isn't what you're looking to populate.

var books = db.Books
    .OrderBy(b => b.BookId)
    .Take(1000)
    .Select(b => new
    {
        b.Author.FullName
    })
    .ToList();

foreach (var book in books)
{
    var doSomethingWithName = book.FullName;
}

This is more code we need to write, for sure, but let's look at the SQL statement Entity Framework executes for us.

SELECT TOP(@__p_0) [b.Author].[FullName]
FROM [Books] AS [b]
INNER JOIN [Authors] AS [b.Author] ON [b].[AuthorId] = [b.Author].[AuthorId]
ORDER BY [b].[BookId]

Very nice! Now we're only querying the FullName column we've referenced in our .Select projection. There's no waste here, and our DBMS has a very good opportunity to run a highly optimize query. Let's time some runs of this code, and see how the performance stands up to the lazy and eager loading approaches.

Projection: Finished in 1684ms
Projection: Finished in 32ms
Projection: Finished in 2ms
Projection: Finished in 1ms
Projection: Finished in 2ms
Projection: Finished in 1ms
Projection: Finished in 1ms
Projection: Finished in 1ms
Projection: Finished in 2ms
Projection: Finished in 2ms
Projection: Finished in 2ms
Projection: Finished in 1ms
Projection: Finished in 1ms

Brilliant! It's taking about 1 or 2 milliseconds, down from over 400 milliseconds. About 400x times faster! That's an easy performance improvement if I've ever seen one. Navigating the properties inside a projection is a wonderful way to optimize performance for Entity Framework queries whose data is intended to be presented in a read-only fashion.

Conclusion

We looked a three simple ways to improve the performance of Entity Framework Core (and these concepts apply older versions of Entity Framework as well).

Avoid working with too much data in-memory. Let the database do what it's best at doing by building up your IQueryable with .Where()'s, .OrderBy()'s, and .Select()'s, and then calling .ToList() rather than vice versa.
Favor eager loading over lazy loading when we need to access navigation properties for less round trips to the database.
Use projection with .Select() when we know we don't need every property to minimize network payload sizes and produce more efficient queries. While these fixes won't address everyone's performance issues they may be having in their Entity Framework applications, these issues are very easy fixes for what could have severe performance consequences.

Happy Coding!

Entity Framework Performance: 3 Things You Must Consider

Introduction - Entity Framework and Performance

Setting Up the Entity Framework Core App

Seeding the Data

Know When Entity Framework Executes SQL Queries

The Wrong Way

The Right Way

Optimizing Further

Avoid Lazy Loading and N+1 Queries!

The Wrong Way

The Right Way

Query Only What You Need

Conclusion

Read Next

Multi-Tenanted Entity Framework Core Migration Deployment

Multi-Tenanted Entity Framework 6 Migration Deployment

Entity Framework 6 vs Entity Framework Core 3: Comparing Performance