Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: limit libgumbo memory allocations for untrusted HTML5 content #2949

Open
flavorjones opened this issue Aug 9, 2023 · 1 comment
Open
Labels
topic/gumbo Gumbo HTML5 parser topic/memory Segfaults, memory leaks, valgrind testing, etc. topic/rfc
Milestone

Comments

@flavorjones
Copy link
Member

Summary

libxml2 has long had default limits on document size in order to prevent untrusted documents from creating an OOM condition and potentially using that as a denial-of-service attack vector. These limits can be removed for trusted documents by setting the HUGE parse option.

libgumbo does not have limits like this, and this issue is being created to discuss the need and possible implementations.

Background

This topic was first raised in #2941 where @stevecheckoway and I discussed the shape of the issue.

@flavorjones flavorjones added topic/memory Segfaults, memory leaks, valgrind testing, etc. topic/rfc topic/gumbo Gumbo HTML5 parser labels Aug 9, 2023
@dan42
Copy link

dan42 commented Aug 23, 2023

It's nice to have "sanity check" type of limits, but silently truncating stuff is not good. Very hard to debug. Please make it raise an error. Ideally have multiple safeties, so we raise on any of

  • input string > 10MB
  • output mem > 15MB
  • tree depth > 1000

(example arbitrary numbers)

@flavorjones flavorjones added this to the v1.18.0 milestone Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/gumbo Gumbo HTML5 parser topic/memory Segfaults, memory leaks, valgrind testing, etc. topic/rfc
Projects
None yet
Development

No branches or pull requests

2 participants