Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #35135 -- Made FilteredRelation raise ValueError on querysets as rhs. #17771

Conversation

ticosax
Copy link
Contributor

@ticosax ticosax commented Jan 23, 2024

I didn't found an existing Ticket and I think the patch is small enough to avoid one.
Please let me know otherwise.

https://code.djangoproject.com/ticket/35135#ticket

@ticosax ticosax force-pushed the regression-filtered-relation-queryset-as-predicate branch from 1cc4d5a to 5b18b5d Compare January 23, 2024 11:00
@felixxm
Copy link
Member

felixxm commented Jan 23, 2024

I didn't found an existing Ticket and I think the patch is small enough to avoid one. Please let me know otherwise.

All bugfixes require an accepted ticket. Is it a regression in Django 5.0?

@felixxm
Copy link
Member

felixxm commented Jan 23, 2024

Looks like a regression in 59f4754.

@ticosax
Copy link
Contributor Author

ticosax commented Jan 23, 2024

I didn't found an existing Ticket and I think the patch is small enough to avoid one. Please let me know otherwise.

All bugfixes require an accepted ticket. Is it a regression in Django 5.0?

All good, I will create one. Yes, it is a regression introduced by DJango 5.0, probably introduced by d660cee and subsequent improvements in afc8805 and 59f4754

@ticosax ticosax changed the title Fix FilteredRelation when rhs of join condition is a QuerySet Fixed #35135 -- FilteredRelation when rhs of join condition is a QuerySet Jan 23, 2024
@ticosax ticosax changed the title Fixed #35135 -- FilteredRelation when rhs of join condition is a QuerySet Fixed #35135 -- Fix FilteredRelation when rhs of join condition is a QuerySet Jan 23, 2024
@@ -104,6 +106,8 @@ def get_child_with_renamed_prefix(prefix, replacement, child):
child = child.copy()
if child.name.startswith(prefix + LOOKUP_SEP):
child.name = child.name.replace(prefix, replacement, 1)
elif isinstance(child, QuerySet):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to skip querysets? I think this will make the original issue (ticket-34362) unfixed for querysets 🤔 For example:

        qs = Author.objects.annotate(
            poem_book=FilteredRelation(
                "book",
                condition=Q(
                    book__in=Book.objects.filter(book__title="Poem by Alice")
                ),
            ),      
        ).filter(poem_book__isnull=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, was not as easy as I thought.
Thanks for the regression test, I will add it to the PR and work on trying to find the proper fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup inside the rhs's QuerySet was incorrect (book__title instead of title).
It's working fine now.
(I added your test case in the PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that subquery in join condition are even supported on all engines.

I guess the question now becomes what should we do if the queryset need to refer to the filtered relation itself or any one of its right-hand-side lookup? I think that's what @felixxm was referring to.

OuterRef are resolved relative to the main model of the query (Author in this case) so OuterRef("book__*") is allowed and without repointing book to poem_book like it's the case for pretty much everything else in the condition it will spawn a different join.

qs = Author.objects.annotate(
    poem_book=FilteredRelation(
        "book",
        condition=Q(
            book__in=Book.objects.filter(title=OuterRef("book__title"))
        ),
    ),      
).filter(poem_book__isnull=False)

Which should produce the equivalent of

SELECT *
FROM author
LEFT JOIN book poem_book ON (
    poem_book.author_id = author.id
    AND poem_book.id IN (
        SELECT id
        FROM book
        WHERE title = poem_book.title
    )
)
WHERE poem_book.id IS NOT NULL

Maybe it's okay to support queryset but not querysets that have outer references?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what @felixxm was referring to.

Yes, sorry, I wrote it in a hurry.

@ticosax ticosax force-pushed the regression-filtered-relation-queryset-as-predicate branch from cc9f796 to e8442e6 Compare January 23, 2024 14:09
Copy link
Member

@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked the branch locally but I'm curious what the original crash was given QuerySet is also resolvable to sql.Query?

@@ -104,6 +106,8 @@ def get_child_with_renamed_prefix(prefix, replacement, child):
child = child.copy()
if child.name.startswith(prefix + LOOKUP_SEP):
child.name = child.name.replace(prefix, replacement, 1)
elif isinstance(child, QuerySet):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that subquery in join condition are even supported on all engines.

I guess the question now becomes what should we do if the queryset need to refer to the filtered relation itself or any one of its right-hand-side lookup? I think that's what @felixxm was referring to.

OuterRef are resolved relative to the main model of the query (Author in this case) so OuterRef("book__*") is allowed and without repointing book to poem_book like it's the case for pretty much everything else in the condition it will spawn a different join.

qs = Author.objects.annotate(
    poem_book=FilteredRelation(
        "book",
        condition=Q(
            book__in=Book.objects.filter(title=OuterRef("book__title"))
        ),
    ),      
).filter(poem_book__isnull=False)

Which should produce the equivalent of

SELECT *
FROM author
LEFT JOIN book poem_book ON (
    poem_book.author_id = author.id
    AND poem_book.id IN (
        SELECT id
        FROM book
        WHERE title = poem_book.title
    )
)
WHERE poem_book.id IS NOT NULL

Maybe it's okay to support queryset but not querysets that have outer references?

Comment on lines 109 to 110
elif isinstance(child, QuerySet):
pass

This comment was marked as outdated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it's possibly not doable because we're dealing with unresolved expression at this point.

@felixxm
Copy link
Member

felixxm commented Jan 24, 2024

I haven't checked the branch locally but I'm curious what the original crash was given QuerySet is also resolvable to sql.Query?

Without this patch, it crashes with:

  File "/django/django/db/models/sql/query.py", line 108, in get_child_with_renamed_prefix
    child = child.copy()
AttributeError: 'QuerySet' object has no attribute 'copy'

@ticosax
Copy link
Contributor Author

ticosax commented Jan 24, 2024

I haven't checked the branch locally but I'm curious what the original crash was given QuerySet is also resolvable to sql.Query?

My understanding is that the QuerySet is interpreted as an Expression because it has a resolve_expression method.

...
elif hasattr(child, "resolve_expression"):
...

but the nature of the QuerySet doesn't allow it to be treated like an Expression. Hence the elif isintance(child, QuerySet) is added, to leave the QuerySet as it is (as it was, in previous versions of Django).

@charettes
Copy link
Member

charettes commented Jan 24, 2024

My understanding is that the QuerySet is interpreted as an Expression because it has a resolve_expression method.

The presence of resolve_expression is exactly what defines the expression protocol. There are other objects that are not BaseExpression subclass that implement this protocol.

but the nature of the QuerySet doesn't allow it to be treated like an Expression.

It's treated as an expression in any other context so I don't think your assessment is right. I think that the usage of copy is problematic here.

To me the problem is more than QuerySet doesn't have a copy method but _clone should do.

Do the tests pass with

elif hasattr(child, "resolve_expression"):
    if isinstance(child, QuerySet):
        child = child._clone()
    else:
        child = child.copy()
    ...

@ticosax
Copy link
Contributor Author

ticosax commented Jan 24, 2024

My understanding is that the QuerySet is interpreted as an Expression because it has a resolve_expression method.

The presence of resolve_expression is exactly what defines the expression protocol. There are other objects that are not BaseExpression subclass that implement this protocol.

but the nature of the QuerySet doesn't allow it to be treated like an Expression.

It's treated as an expression in any other context so I don't think your assessment is right. I think that the usage of copy is problematic here.

To me the problem is more than QuerySet doesn't have a copy method but _clone should do.

Do the tests pass with

elif hasattr(child, "resolve_expression"):
    if isinstance(child, QuerySet):
        child = child._clone()
    else:
        child = child.copy()
    ...

We end up with another missing method expected by the Expression's API.

AttributeError: 'QuerySet' object has no attribute 'set_source_expressions'. Did you mean: 'resolve_expression'?

@charettes
Copy link
Member

charettes commented Jan 24, 2024

The problem is that QuerySet has very specialized expression resolving logic as its source expressions are stored in multiple different attributes (annotations, where) and can be composed of other querysets as well that themselves would need to have their references repointed (union, intersection).

That brings the question what exactly should be done here.

  1. Don't support querysets
  2. Allow support for queryets but don't support OuterRef or reference to the filtered expression itself (as that requires complex replacement logic)
  3. Add full support for querysets

1. is the current state of things albeit with an ugly error, 2. is what's proposed here with the pass approach without any warning about the lack of support for OuterRef and references to filtered relations (which would be very hard to implement right), 3. is a large project that has likely minimal benefit given how rare joining using a subquery condition is.

My take is that we should either go with 1. and raise a more appropriate error (it can always be worked around by materializing the queryset by users by doing list(qs.values("pk"))) or 2. and emit a warning of some sort.

I have to admit I have a slight preference for 1. because of the can of worms 2. opens with regards to supporting some querysets and not others as whether a subquery is supported or not is not something that can be easily determined at condition replacement time.

Maybe we can try testing out what the message emitted when making use of OuterRef is and see if we believe it is appropriate?

@ticosax
Copy link
Contributor Author

ticosax commented Jan 24, 2024

Thanks for the clarifications. I agree with your strategy. I'll give it a try to option 2 first, though. As it's already not supported, I don't expect many complains will come out of this option.

@ticosax
Copy link
Contributor Author

ticosax commented Jan 25, 2024

For the record this is what the orm would produce if we would let QuerySet passed as it is.

qs = Author.objects.annotate(
    poem_book=FilteredRelation(
        "book",
        condition=Q(
            book__in=Book.objects.filter(title=OuterRef("book__title"))
        ),
    ),      
).filter(poem_book__isnull=False)

Outputs

SELECT "filtered_relation_author"."id", "filtered_relation_author"."name", "filtered_relation_author"."content_type_id", "filtered_relation_author"."object_id"
FROM "filtered_relation_author
LEFT OUTER JOIN "filtered_relation_book" T3 ON ("filtered_relation_author"."id" = T3."author_id")
INNER JOIN "filtered_relation_book" poem_book ON ("filtered_relation_author"."id" = poem_book."author_id" AND (poem_book."id" IN (SELECT U0."id" FROM "filtered_relation_book" U0 WHERE U0."title" = (T3."title"))))
WHERE poem_book."id" IS NOT NULL

Which is clearly not what we want, and letting it passed, silently, would be damaging to the users.
Good call @charettes

I'm still continuing to make progress on option 2. to see if it seems applicable. stay tuned.

@ticosax ticosax force-pushed the regression-filtered-relation-queryset-as-predicate branch 3 times, most recently from 3be6b5d to f99e02b Compare January 25, 2024 08:22
isinstance(where_child.rhs, ResolvedOuterRef)
for where_child in queryset.query.where.children
):
raise ValueError("Passing OuterRef within a FilteredRelation is not supported.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's worth going this route. Does it catch nested OuterRef() calls?

@felixxm
Copy link
Member

felixxm commented Jan 29, 2024

I'd prefer not to support querysets at all. Especially since it's intended for backporting.

@ticosax ticosax force-pushed the regression-filtered-relation-queryset-as-predicate branch from 9827ebc to 44ee313 Compare January 29, 2024 07:43
@ticosax
Copy link
Contributor Author

ticosax commented Jan 29, 2024

I changed the direction of the PR to prevent passing a QuerySet within a FilteredRelation.

@felixxm felixxm changed the title Fixed #35135 -- Fix FilteredRelation when rhs of join condition is a QuerySet Fixed #35135 -- Made FilteredRelation raise ValueError on querysets as rhs. Jan 29, 2024
@felixxm felixxm force-pushed the regression-filtered-relation-queryset-as-predicate branch from 0581289 to 2a5e593 Compare January 29, 2024 18:48
@felixxm felixxm force-pushed the regression-filtered-relation-queryset-as-predicate branch from 2a5e593 to 820c5f1 Compare January 29, 2024 19:31
@felixxm
Copy link
Member

felixxm commented Jan 29, 2024

@ticosax Thanks 👍

I pushed edits to the release note and comment.

@felixxm
Copy link
Member

felixxm commented Jan 29, 2024

@charettes Does it work for you?

@charettes
Copy link
Member

@felixxm it does, thanks for the patch @ticosax!

I wonder if we should suggest that users materialize the queryset to workaround the problem in the error message. e.g. list(Book.objects.filter(title__istartswith="a").values_list("id", flat=True))). Maybe that's something the rare users who use subquery in this case will be able to figure out on their own?

@felixxm
Copy link
Member

felixxm commented Jan 30, 2024

@felixxm it does, thanks for the patch @ticosax!

I wonder if we should suggest that users materialize the queryset to workaround the problem in the error message. e.g. list(Book.objects.filter(title__istartswith="a").values_list("id", flat=True))). Maybe that's something the rare users who use subquery in this case will be able to figure out on their own?

Thanks for checking 👍 I think it's fine without a hint.

@felixxm felixxm merged commit 820c5f1 into django:main Jan 30, 2024
35 checks passed
@ticosax ticosax deleted the regression-filtered-relation-queryset-as-predicate branch February 8, 2024 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants