New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed #30686 -- Used Python HTMLParser in utils.text.Truncator #16421
Conversation
We should bear in mind 7f65974 |
64e5739
to
9af6c3b
Compare
Having checked out my branch / PR to
|
@matthiask you're the domain expert here. Can I ask you to (re-)review with a mind to are we doing it? — thanks! 🎁 |
I added some suggested changes in smithdc1#3. |
OK... so... what motivated the whole ticket here was a persistent trickle of security reports about ever more ingenious ways of making the regex misbehave. That problem is intractable, hence Let's use a parser. As far as I can see, adjusting the tests there to show that the input is (quickly) rejected would be sufficient. (We need to ensure that we don't open a DoS vector, not necessarily handle such deviant input.) //cc @django/django-security-team |
#16421 (comment) — Using the parser is quicker. I think we're green. @smithdc1 if you want to look at @ngnpope's comments, and resolve, we should be able to push this forwards. Thanks for effort! 🏅 (I will mark as |
Sorry for not reacting sooner here. The speed up is a pleasant and unexpected surprise. I do not have any further comments apart from LGTM, thanks! |
Looks great! May I ask about how invalid html is treated before and after the patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @smithdc1! Thanks for the update. I've had another thorough pass through this. Let me know if you want any help getting this progressed 🙂
c4f6095
to
4c09d1c
Compare
Moved 2 commits to the #17071. |
Please rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David. Definitely feels like we're getting closer.
@@ -41,7 +41,7 @@ def test_truncate_unicode(self): | |||
) | |||
|
|||
def test_truncate_something(self): | |||
self.assertEqual(truncatechars_html("a<b>b</b>c", 3), "a<b>b</b>c") | |||
self.assertEqual(truncatechars_html("a<b>b</b>c", 3), "a<b>b</b>…") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so this is definitely a regression we're introducing here that we should fix.
I added the following test to tests/template_tests/filter_tests/test_truncatechars.py
which passes fine:
@setup({"truncatechars04": "{{ a|truncatechars:3 }}"})
def test_truncatechars01(self):
output = self.engine.render_to_string(
"truncatechars04", {"a": "abc"}
)
self.assertEqual(output, "abc")
I think we should add this as it covers the exact length case which is missing there.
The HTML chars case is regressing. The problem is on this line:
We are passing in length=truncate_len
when we should be passing in length=length
, as for the words case. When you look at the text chars case we are passing through both length
and truncate_len
.
I suspect that we will want to hoist the following into a global helper:
Which we can then call independently from the appropriate place in the text branch or the HTML branch (inside of TruncateCharsHTMLParser
).
@smithdc1 Hey! Is this ready for re-review? If yes perhaps we should unset the needs improvement flag in the ticket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments, looks good!
There are unresolved comments, e.g. #16421 (comment) or #16421 (comment). |
I believe the first one is already addressed in 4f1791c |
@smithdc1 Hi! Happy New Year 🎆 Would you have time to keep working on this? If not I might try to push it to the finish line. Let me know! Thanks. |
Hi @nessita -- happy new year. Hope you had a good break! 🎆 Yes. Let me update this one. I'll have to remind myself of where I got to! |
4f1791c
to
b017059
Compare
5d6beea
to
b3a575c
Compare
@smithdc1 Thanks for all your efforts 👍 |
Yes, thanks @smithdc1. This is a solid improvement. Also thanks for enduring my tweaks/reviews! |
Great improvement @smithdc1 👏🏻 |
Marking as draft -- it's not finished but thought it may be helpful to share my thinking as the ticket is being reviewed. Even if the proposal isn't valid, maybe the extra tests are worth keeping.
https://code.djangoproject.com/ticket/30686