Django and multiple joins on the same table

I picked up this trick back using Wordpress, which I always kinda liked, you scenario: you have a Content model and a Date model, but the Date model is generically-related:

#django models
class Date(models.Model):
    # GR stuff
    content_type = models.ForeignKey(ContentType, null=True, blank=True)
    object_id = models.PositiveIntegerField(null=True, blank=True)
    content_object = generic.GenericForeignKey()

    # actual data
    key = models.CharField(max_length=75)
    date = models.DateTimeField()

class Content(models.Model):
    title = models.CharField(max_length=75)
    body = models.TextField()

    dates = generic.GenericRelation(Date)

The reason you would do this is so you could have as many dates related to the content as you wanted (you don't have to use a generic relation for this, but I like to use dates on lots of things).

What fun

I'm a little eccentric, I like to do wierd things, like with Django's .extra() method on querysets, you can do some funky stuff, like pull several of the generically-related dates based on their 'key' value, and add that 'key' value directly to the Content object as though it were a native attribute. Something like this:

>>> from myapp.models import Content, Date
>>> from datetime import datetime
>>> content = Content(title="Hello", body="My content :)")
>>> content.save()
>>> post_date = Date(key='post_date', date=datetime.now())
>>> post_date.save()
>>> content.dates.add(post_date)
>>> modified_date = Date(key='modified_date', date=datetime.now))
>>> modified_date.save()
>>> content.dates.add(modified_date)

Now the trick here is to join the myapp_date table on to our content twice, which is easy enough:

>>> content = Content.objects.all()
>>> content.filter(dates__key='post_date')
>>> content.filter(dates__key='modified_date')

But now we have a problem

Because django is so wonderful, it does all the work of joining tables and keeping track of those pesky aliases for you, but how do we select the dates from the joined tables (using .extra(select={xxx})) if we don't know the aliases? Well we can spit out the query, as mentioned in the django docs, and that works fine (for now):

>>> print content.query # live on the edge...
# the result is something like:
SELECT "myapp_content"."title", "myapp_content"."body" FROM "myapp_content" INNER JOIN "myapp_date" ON ("myapp_content"."id" = "myapp_date"."object_id") INNER JOIN "myapp_date" T2 ON ("myapp_content"."id" = T2."object_id") WHERE ("myapp_date"."key" = "post_date" AND T2."key" = "modified_date")
# Disclaimer: I wrote this query by hand, it's an approximation, yours may differ, also note: it's pretty damn close.

Now if you were just feeling adventurous, you could simply nab the alias names from that query and grab the extra data as needed:

>>> content.extra(select={'post_date': 'myapp_date.date', 'modified_date': 'T2.date'})
>>> print content.query
# the result is something like:
SELECT (myapp_date.date) AS post_date, (T2.date) AS modified_date, "myapp_content"."title", "myapp_content"."body" FROM "myapp_content" INNER JOIN "myapp_date" ON ("myapp_content"."id" = "myapp_date"."object_id") INNER JOIN "myapp_date" T2 ON ("myapp_content"."id" = T2."object_id") WHERE ("myapp_date"."key" = "post_date" AND T2."key" = "modified_date")
# extra(select={}) just pops a couple extra selects in there, like it should, but yuck, static references ie: T2
>>> content[0].post_date
datetime.datetime(2010, ...)
>>> content[0].modified_date
datetime.datetime(2010, ...)

Fun huh?

But, there is a gotcha...

I had been employing this method in an application, and arbitrarily decided to upgrade my django. Lo' and behold, the alias names changed! Whooda thought? So after much turmoil and source-code-diggery, I found the way to get those pesky aliases (and therefore perform the query) without all the guilt. The trick was I had to find the place in django.db.models.sql.query.Query where the table alias (T2) was related to the part in the where clause that was related to the field (post_until). The answer was in the content object's where clause, something like:

>>> content.query.where.children
[(<django.db.models.sql.where.constraint object="object" at="at">, 'exact', True, 'post_date'),(</django.db.models.sql.where.constraint><django.db.models.sql.where.constraint object="object" at="at">, 'exact', True, 'modified_date'),...]
>>> content.query.where.children[0].alias # < the constraint holds the key, literally
'myapp_date'
>>> content.query.where.children[1].alias
'T2'
</django.db.models.sql.where.constraint>

And there you have it

So my quick solution to keep the aliases produced by they code referenced dynamically, I just build a quick little dict out of the 'where.children' so I could reference the aliases properly later...

# summed up 
>>> content = Content.objects.all()
>>> content.filter(dates__key='post_date')
>>> content.filter(dates__key='modified_date')
>>> alias_map = {}
>>> for condition in content.query.where.children:
...    # alias_map['modified_date'] = 'T2'
...    alias_map[condition[3]] = condition[0].alias 

>>> content.extra(select={ \
...    'post_date': '%s.date' % alias_map['post_date'], \
...    'modified_date': '%s.date' % alias_map['modified_date']})

And with that, the keys end up as they should. Nothing spectacular, it just took some digging to find where the join "black magic" was happening in django. Thanks to jtiai on #django for the pointers.

Hope that really helps someone down the line,

Aaron

Posted

Jan 22, 2010
by AnonymousUser

Categories

General Posts