-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable escaping of href
attribute
#152
Comments
Hi @th0r, thanks for asking this question, and apologies for my slow reply. What you're seeing is actually behavior that's built into libxml2. The function I've given some thought about what it would take for Loofah to support templating, and am open to exploring how to have people opt-in to it. However, we'd have to be very cautious about how we do it to avoid allowing embedded quotes to be rendered unescaped. I'm curious how other people are getting around templating given this behavior? I don't write Rails much anymore so I'm curious if anyone could enlighten me. |
I'll note #160 asks a similar question for |
Hi @flavorjones, thanks for the reply! I was trying to sanitize the user's HTML templates before I processed them with Liquid. But now I've realized that I can just change the order of operations. Instead of I would suggest doing the same thing for the client-side template engine. In that case, you can't run Loofah in the browser, but DOMPurify is very similar (I'm also using that in my frontend.) So instead of I think this might be a better solution, and I wouldn't want to try to hack around |
@ndbroadbent Thanks for closing the loop here. Your approach on templates makes sense. I'm going to close this, since it doesn't seem like there's anything actionable for Loofah at this point. Happy to continue the conversation if anyone wants to, though. |
Hi! 👋 Has anything like this come up again? We're offering Liquid for some user's custom templating, and need to store and retrieve raw (but sanitized) Liquid in the database. Loofah is sanitizing everything perfectly, except that our One not-so-great solution I've come up with is to use Loofah as expected, and then scan the result for markup tags that have been encoded, then decode them. This seems error prone, though. @flavorjones did you have some better idea of how this could work for Loofah based on your previous thinking? Or, does libxml2 only do this to Thank you! |
Hi @bbugh, I linked to the underlying libxml2 code in a comment above, here it is again: https://gitlab.gnome.org/GNOME/libxml2/blob/v2.9.2/HTMLtree.c#L688 The important bit is: if ((cur->ns == NULL) && (cur->parent != NULL) &&
(cur->parent->ns == NULL) &&
((!xmlStrcasecmp(cur->name, BAD_CAST "href")) ||
(!xmlStrcasecmp(cur->name, BAD_CAST "action")) ||
(!xmlStrcasecmp(cur->name, BAD_CAST "src")) ||
((!xmlStrcasecmp(cur->name, BAD_CAST "name")) &&
(!xmlStrcasecmp(cur->parent->name, BAD_CAST "a"))))) { which you can interpret as detecting:
|
Thank you @flavorjones for the interpretation! Sorry I didn't notice that link earlier. I will dig in and try to remember to post the solution here when I get one. |
I also faced this issue, and I came up with a workaround similar to this one #240 (comment) value
.gsub(/(<(?:a|img)[^>]+)(href|src)(=[^>]*>)/, '\1protected-attribute-\2\3')
.then { |v| SANITISERS.inject(Loofah.fragment(v)) { |n, scrubber| n.scrub!(scrubber) } }
.to_s
.gsub(/protected-attribute-(href|src)/, '\1') The regexp might be better because currently, it also handles |
It's probably worth folks following #239 which will (eventually) update Loofah to use Nokogiri's HTML5 parser by default, which does not have this escaping behavior: puts Nokogiri::HTML4.parse('<a href="{{ foo }}" style="{{ foo }}">').to_html
# => <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# <html><body><a href="%7B%7B%20foo%20%7D%7D" style="{{ foo }}"></a></body></html>
puts Nokogiri::HTML5.parse('<a href="{{ foo }}" style="{{ foo }}">').to_html
# => <html><head></head><body><a href="{{ foo }}" style="{{ foo }}"></a></body></html> I think that's the right fix for these situations. |
Problem:
I use Rails and want to sanitize HTML that will be used as input for client-side template engine.
For example the following HTML:
should be sanitized to this:
But the thing is the value for
href
attribute is being escaped for some reason and HTML becomes this:Is there a way to disable encoding of
href
attribute?Reproduction:
Note that
other-attr
value is left untouched.The text was updated successfully, but these errors were encountered: