-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always reformat tab characters as space characters #262
Comments
The CommonMark Spec 0.20: Preprocessing used to specify:
but this was changed in version 2.1 onward. I'm not sure what the motivation was for the change, but there two relevant issues on the CommonMark GitHub Project: commonmark-spec#386 and commonmark-spec#318. It's probably worth noting that any tab characters that ever find their way into my Markdown documents are introduced by copy-and-paste and aren't there intentionally. |
Has any progress been made on this? I'm very interested in this feature and I'd be open to making a PR if there's interest. |
@jgopel, I'm still interested in this feature. I don't think that it's been implemented independently of this Issue. |
@hukkin Would you be interested in merging this if I were to make a PR for it? |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Hey, yeah this feature is welcome provided we test extensively to make sure rendered output never changes in some obscure corner case. Fenced code blocks should not be touched. |
#466 does the conversion for text inlines. I think this should be done to code spans too, so I'll leave this open. |
Description / Summary
The tab character (0x09) is a pest.
Currently,
mdformat
seeks to "apply consistent white space across the board" (Formatting Style: Whitespace) and does the right thing when tab characters appear as leading white space for indentation: it eliminates the pest by replacing them with the appropriate number of space characters for indentation. Line-trailing tabs are also eliminated.Unfortunately, tab characters in heading and paragraph bodies, where HTML white space collapse will apply when the HTML is rendered for display, are not eliminated by collapsing them into a single space character. I think that tab characters should be eliminated in this context, also, because tabs cause problems.
I believe there are three contexts where tab characters might appear and there's a case for elimination in each:
<code>
or<pre>
blocks or (b) expand to the appropriate number of space characters.I'd propose that always eliminating tab characters and replacing them with the appropriate number of space characters is the way to "apply consistent white space across the board" and that the current mixed treatment of tab characters is inconsistent with
mdformat
's style goals. Mixed tabs and spaces are seldom good.There might be an open question with regard to (3), above, because CSS might change the width of tab characters rendered in
<code>
or<pre>
or other HTML blocks?Value / benefit
Implementation details
I think that modifying the
TextWrapper
instance attributes heremdformat/src/mdformat/renderer/_context.py
Lines 330 to 336 in a856f53
to
will achieve the desired white space collapse of a tab character to a space character, but it won't help in collapsing multiple tab-and-space character runs into a single space. The
replace_whitespace
instance attribute would seem to affect all white space characters and not just tab characters.Tasks to complete
No response
The text was updated successfully, but these errors were encountered: