Mastering Django ORM: From Models to Advanced Queries

6 min read2 days ago

Introduction

Before diving into Django’s powerful ORM capabilities, it’s essential to understand the data structure we’ll be working with. In this comprehensive guide, we’ll first define our models, then explore the full range of QuerySet operations available in Django ORM.

1. Defining Our Models

Let’s create a blog application with the following models:

from django.db import models
from django.contrib.auth.models import User

# Author Model
class Author(models.Model):
    user = models.OneToOneField(User, on_delete=models.CASCADE)
    bio = models.TextField()
    birth_date = models.DateField(null=True, blank=True)
    website = models.URLField(blank=True)
    
    def __str__(self):
        return self.user.get_full_name()

# Category Model
class Category(models.Model):
    name = models.CharField(max_length=100)
    slug = models.SlugField(unique=True)
    description = models.TextField(blank=True)
    
    class Meta:
        verbose_name_plural = "categories"
        
    def __str__(self):
        return self.name

# Post Model
class Post(models.Model):
    DRAFT = 'DF'
    PUBLISHED = 'PB'
    ARCHIVED = 'AR'
    
    STATUS_CHOICES = [
        (DRAFT, 'Draft'),
        (PUBLISHED, 'Published'),
        (ARCHIVED, 'Archived'),
    ]
    
    title = models.CharField(max_length=200)
    slug = models.SlugField(unique_for_date='publish_date')
    content = models.TextField()
    publish_date = models.DateTimeField(auto_now_add=True)
    last_updated = models.DateTimeField(auto_now=True)
    status = models.CharField(
        max_length=2,
        choices=STATUS_CHOICES,
        default=DRAFT
    )
    author = models.ForeignKey(
        Author,
        on_delete=models.CASCADE,
        related_name='posts'
    )
    categories = models.ManyToManyField(
        Category,
        related_name='posts',
        blank=True
    )
    featured_image = models.ImageField(
        upload_to='posts/',
        null=True,
        blank=True
    )
    view_count = models.PositiveIntegerField(default=0)
    is_premium = models.BooleanField(default=False)
    
    class Meta:
        ordering = ['-publish_date']
        indexes = [
            models.Index(fields=['-publish_date']),
        ]
    
    def __str__(self):
        return self.title

# Comment Model
class Comment(models.Model):
    post = models.ForeignKey(
        Post,
        on_delete=models.CASCADE,
        related_name='comments'
    )
    name = models.CharField(max_length=100)
    email = models.EmailField()
    content = models.TextField()
    created_at = models.DateTimeField(auto_now_add=True)
    approved = models.BooleanField(default=False)
    
    class Meta:
        ordering = ['created_at']
    
    def __str__(self):
        return f"Comment by {self.name} on {self.post.title}"

2. Basic QuerySet Operations

2.1 Retrieving All Objects

# Get all posts
all_posts = Post.objects.all()

# Get all published posts
published_posts = Post.objects.filter(status=Post.PUBLISHED).all()

The .all() method returns a QuerySet containing all objects that match the query. While it's optional (as Post.objects.filter() would work the same), it makes the intention explicit.

2.2 Filtering Objects

# Get posts by a specific author
author_posts = Post.objects.filter(author__user__username='johndoe')

# Get posts with more than 1000 views
popular_posts = Post.objects.filter(view_count__gt=1000)

# Get posts published in 2023
from datetime import datetime
posts_2023 = Post.objects.filter(
    publish_date__year=2023,
    status=Post.PUBLISHED
)

The .filter() method returns a QuerySet containing objects that match the given conditions. While you can chain multiple filters, each one narrows down the results further.

2.3 Excluding Objects

# Get all posts except drafts
non_draft_posts = Post.objects.exclude(status=Post.DRAFT)

# Get posts not by specific authors
non_john_posts = Post.objects.exclude(author__user__username__in=['johndoe', 'janedoe'])

# Complex exclusion
recent_non_premium = Post.objects.filter(
    publish_date__year=2023
).exclude(
    is_premium=True
)

The .exclude() method does the opposite of .filter()— it returns objects that do not match the specified conditions, making it useful for negative filtering.

3. Advanced Field Lookups

3.1 Null/Blank Field Handling

# Posts without a featured image
no_image_posts = Post.objects.filter(featured_image__isnull=True)

# Posts with empty content (both None and empty string)
empty_content_posts = Post.objects.filter(content__exact='')

# Alternative for checking empty strings
empty_content_posts = Post.objects.filter(content='')

The __isnull lookup checks whether a field is NULL (or empty). Use field__isnull=True to find empty values and False for non-empty ones.

3.2 String Matching

# Case-sensitive title starts with "The"
the_posts = Post.objects.filter(title__startswith="The")

# Case-insensitive title contains "python"
python_posts = Post.objects.filter(title__icontains="python")

# Title ends with "guide" (case-sensitive)
guide_posts = Post.objects.filter(title__endswith="guide")

# Using regex (case-sensitive)
regex_posts = Post.objects.filter(title__regex=r'^[A-Z]')  # Starts with capital letter

# Using iregex (case-insensitive)
iregex_posts = Post.objects.filter(title__iregex=r'^the')  # Starts with "the" in any case

The __startswith lookup matches values that begin with a given string (case-sensitive), while __istartswith does the same case-insensitively.
title__endswith="guide" - Finds posts where the title exactly ends with 'guide' (case-sensitive), useful for matching specific title patterns.
title__regex=r'^[A-Z]' - Uses a regular expression to match posts with titles starting with a capital letter (case-sensitive), ideal for enforcing title formatting rules.
title__iregex=r'^the' - Performs case-insensitive regex matching to find titles starting with 'the' (e.g., 'The', 'THE', 'the'), helpful for flexible text searches.

3.3 Date/Time Lookups

from datetime import date, timedelta

# Posts published today
today = date.today()
today_posts = Post.objects.filter(publish_date__date=today)

# Posts published this week
week_ago = today - timedelta(days=7)
recent_posts = Post.objects.filter(publish_date__date__gte=week_ago)

# Posts published between 9am and 5pm
working_hours_posts = Post.objects.filter(
    publish_date__hour__gte=9,
    publish_date__hour__lte=17
)

3.4 Complex Queries with Q Objects

from django.db.models import Q

# Posts about Python OR Django
python_or_django = Post.objects.filter(
    Q(title__icontains='python') | Q(title__icontains='django')
)
# Premium posts about Python but not about Django
premium_python = Post.objects.filter(
    Q(is_premium=True) & 
    Q(title__icontains='python') & 
    ~Q(title__icontains='django')
)
# Combining Q objects with regular filters
popular_python = Post.objects.filter(
    Q(title__icontains='python'),
    view_count__gt=1000,
    status=Post.PUBLISHED
)

Q objects allow complex queries with AND (&), OR (|), and NOT (~) conditions, enabling advanced filtering beyond simple .filter() calls
The ~ operator acts as NOT in Q object expressions.

3.5 Aggregation and Annotation

from django.db.models import Count, Avg, Max, Min, Sum, F

# Number of posts per author
from django.db.models import Count
author_stats = Author.objects.annotate(
    post_count=Count('posts'),
    avg_views=Avg('posts__view_count')
)

# Increment view count for a post
Post.objects.filter(pk=1).update(view_count=F('view_count') + 1)

# Most popular categories
popular_categories = Category.objects.annotate(
    post_count=Count('posts'),
    avg_views=Avg('posts__view_count')
).order_by('-post_count')[:5]

F() references a field's value directly in queries, allowing comparisons between fields or arithmetic operations (e.g., incrementing a counter).
.aggregate() computes global values (e.g., Sum, Avg) across a QuerySet, while .annotate() adds calculated fields to each object in the results.

4. Performance Optimisation

4.1 select_related (for ForeignKey and OneToOne)

# Efficiently fetch post with author info (single query)
posts_with_authors = Post.objects.select_related('author').all()

# Deep relationship (author and user)
posts_with_user = Post.objects.select_related('author__user').all()

select_related optimises foreign-key/one-to-one queries (single SQL join).

4.2 prefetch_related (for ManyToMany and reverse ForeignKey)

# Efficiently fetch posts with categories
posts_with_cats = Post.objects.prefetch_related('categories').all()

# Multiple relationships
detailed_posts = Post.objects.select_related('author').prefetch_related(
    'categories',
    'comments'
).all()

prefetch_related efficiently handles many-to-many/reverse relations (separate queries)

4.3 only() and defer()

# Only load specific fields
lightweight_posts = Post.objects.only('title', 'publish_date', 'author')

# Defer loading of heavy fields
posts_without_content = Post.objects.defer('content')

.only() loads only specified fields (optimising queries).
.defer() does the opposite—loading all fields except the specified ones.

4.4 Raw SQL Queries

# Simple raw query
raw_posts = Post.objects.raw('SELECT * FROM blog_post WHERE status = "PB"')

# Parameterized query (safe from SQL injection)
from django.db import connection
with connection.cursor() as cursor:
    cursor.execute("""
        SELECT title, publish_date 
        FROM blog_post 
        WHERE view_count > %s
    """, [1000])
    rows = cursor.fetchall()

# Mapping raw query to model
raw_with_params = Post.objects.raw(
    'SELECT * FROM blog_post WHERE view_count > %s',
    [1000]
)

# Complex raw query with joins
complex_query = """
    SELECT p.* 
    FROM blog_post p
    JOIN blog_author a ON p.author_id = a.id
    JOIN auth_user u ON a.user_id = u.id
    WHERE u.username = %s AND p.status = %s
"""
user_posts = Post.objects.raw(complex_query, ['johndoe', 'PB'])

The .raw() method executes raw SQL queries when Django ORM isn’t sufficient, returning model instances just like regular QuerySets.

5. Best Practices

5.1 Chain filters instead of multiple queries:

# Good 
queryset = Post.objects.filter(status=Post.PUBLISHED) 
queryset = queryset.filter(publish_date__year=2023) 

# Bad (executes two queries) 
queryset = Post.objects.filter(status=Post.PUBLISHED).all() 
queryset = Post.objects.filter(publish_date__year=2023).all()

5.2 Use iterator() for large querysets:

for post in Post.objects.all().iterator():
  process_post(post)

5.3 Use count() instead of len() for querysets:

# Good (database count) 
count = Post.objects.count() 

# Bad (loads all objects into memory) 
count = len(Post.objects.all())

5.3 Avoid N+1 problems with select_related/prefetch_related

6. What is the N+1 Query Problem?

The N+1 query problem is a common performance issue in database-driven applications (like Django) where:

1 initial query fetches a list of objects (e.g., blog posts).
N additional queries are then executed to fetch related data for each object (e.g., the author details for each post).

This leads to poor performance because instead of 1 optimised query, the app makes N+1 queries (where N is the number of objects retrieved).

6.1 select_related() - For ForeignKey/OneToOne

Scenario: Fetch all blog posts and display each post along with its author’s name.

Problematic Code (N+1 Queries)

# 1 Query: Get all posts
posts = Post.objects.all()

for post in posts:
    # N Queries: Get author for each post (1 query per post!)
    print(f"Post: {post.title}, Author: {post.author.name}")

If there are 100 posts, this executes 101 queries (1 for posts + 100 for authors).

Optimised Code (1 Query)

# Uses JOIN to fetch posts + authors in 1 query
posts = Post.objects.select_related('author').all()

for post in posts:
    # No additional queries (author is already loaded)
    print(f"Post: {post.title}, Author: {post.author.name}")

Only 1 SQL query is executed (using a JOIN).

6.2 prefetch_related() - For ManyToMany/Reverse Relations

Scenario: Fetch all posts and their categories (many-to-many relationship).

Problematic Code (N+1 Queries)

posts = Post.objects.all()

for post in posts:
    # N Queries: Get categories for each post
    print(f"Post: {post.title}, Categories: {post.categories.all()}")

Optimised Code (2 Queries)

# 1 Query: Get posts + 1 Query: Get all related categories
posts = Post.objects.prefetch_related('categories').all()

for post in posts:
    # No additional queries (categories are prefetched)
    print(f"Post: {post.title}, Categories: {post.categories.all()}")

Total queries: 2 (1 for posts, 1 for categories), regardless of the number of posts.

Conclusion

Django ORM turns complex database queries into simple Python code. Instead of writing long SQL statements, you can use:

Basic filters (filter(), exclude(), get()) to find exactly what you need.
Smart optimisations (select_related(), prefetch_related()) to speed up your app.
Handy tricks (annotate(), aggregate()) for calculations without extra code.

No need to overcomplicate things — Django ORM does the heavy lifting for you. Keep experimenting, and soon querying databases will feel effortless!

That’s it! Now go build something awesome. 🚀