-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] default to html5 parsing #239
Conversation
allow us to add sanitized values with arbitrary keys to the data file
this removes noisy bad attributes from output for unsafe nodes that are being escaped
72f91b3
to
0bff3f6
Compare
0bff3f6
to
9ab4a31
Compare
See downstream rails/rails-html-sanitizer#133 for an indication of impact of this change |
Running some large Rails apps' CI with the HTML5 parser, found another notable behavioral difference that didn't show up in this test suite: binary attributes which don't have values in the libxml2 HTML4 parser ( |
I'd love to get some benchmarks just to understand the impact of making this change. It's unlikely to change my mind but would be good information to have handy. |
I've run three large rails apps at Shopify through CI with this branch of Loofah and (except for three tests that were relying on the above binary attribute behavior) everything was green. This is encouraging. |
9ab4a31
to
76e40a8
Compare
Note that there is an escape hatch which is to set the environment variable LOOFAH_HTML4_MODE to return to the previous behavior. CI tests both html4 and html5 modes
76e40a8
to
8b75439
Compare
Would it make sense to get this merged with the default set to the existing HTML4, so apps can opt in to the HTML5 behavior and start testing? That would presumably make this a far safer change to release. |
@DanielHeath thanks for asking. There's a bit of a yak shave of dependencies here. Loofah can only support HTML5 in Nokogiri >= 1.14.0 because it requires the subclassing fix at sparklemotion/nokogiri@ebde7da Hoping to get that nokogiri release out in an RC this weekend if I can finish Ruby 3.2 support in rake-compiler-doc. |
I picked this back up again this week, now that Nokogiri 1.14.0.rc1 has been prereleased. This work is mostly done, though there are still details to work through here and in rails-html-sanitizer. |
Closing this in favor of #261 which has a much better API for introducing HTML5 support. |
The libgumbo parser used by Nokogiri::HTML5 is superior to the libxml2 parser used by Nokogiri::HTML4 (the default).
This is a draft pull request to see how hard it would be to default to use that parser for Loofah's sanitization, and evaluate what changes might be breaking to the many Rails apps that use it.
Note that Loofah can only support HTML5 in Nokogiri >= 1.14.0 because it requires the subclassing fix at sparklemotion/nokogiri@ebde7da
See a related but orthogonal issue to default Nokogiri to HTML5: sparklemotion/nokogiri#2331