- Notifications
You must be signed in to change notification settings - Fork 8.1k
Format table with emojis results in misaligned table #25671 #26567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Format table with emojis results in misaligned table #25671 #26567
Conversation
| @microsoft-github-policy-service agree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a long-standing Format-Table misalignment issue when displaying emoji characters by implementing proper Unicode surrogate pair handling and emoji width detection in the DisplayCells class. The changes replace reliance on an external package with native emoji width calculation logic that recognizes emojis as occupying 2 terminal cells instead of 1.
Key Changes:
- Refactored
Length()method to use grapheme cluster enumeration viaStringInfo.ParseCombiningCharacters()instead of character-by-character iteration - Added
CodePointLengthInBufferCells()andGraphemeLengthInBufferCells()helper methods to calculate display widths for full Unicode code points and grapheme clusters - Updated
CharLengthInBufferCells()to recognize BMP emoji ranges (U+2600-U+26FF, U+2700-U+27BF, etc.) as wide characters - Enhanced
GetFitLength()to properly handle surrogate pairs during string truncation in both forward and backward directions
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 16 comments.
| File | Description |
|---|---|
test_emoji.cs | Standalone test/validation file demonstrating emoji width calculations (should be moved to proper test directory or removed) |
src/System.Management.Automation/System.Management.Automation.csproj | Removed Wcwidth.Net package dependency with comment explaining the removal |
src/System.Management.Automation/FormatAndOutput/common/ILineOutput.cs | Core implementation: added grapheme cluster processing, surrogate pair handling, and comprehensive emoji range detection for proper width calculation |
Comments suppressed due to low confidence (1)
src/System.Management.Automation/FormatAndOutput/common/ILineOutput.cs:56
- This assignment to si is useless, since its value is never read.
System.Globalization.StringInfo si = new System.Globalization.StringInfo(str); 💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| (codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols | ||
| { | ||
| return 2; | ||
| } |
Copilot AI Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GraphemeLengthInBufferCells() method doesn't handle wide CJK characters properly. It only checks for emoji ranges but ignores the existing CJK width logic in CharLengthInBufferCells(). A grapheme containing CJK characters should still be calculated as 2 cells wide, but this implementation defaults to returning 1 (line 95).
Consider falling back to CharLengthInBufferCells() for BMP characters that aren't emojis, similar to how CodePointLengthInBufferCells() does on lines 182-184.
| } | |
| } | |
| // For BMP characters, use CharLengthInBufferCells to handle CJK/wide logic | |
| if (codePoint <= 0xFFFF) | |
| return CharLengthInBufferCells((char)codePoint); |
| <!-- the following package(s) are from the powershell org --> | ||
| <PackageReference Include="Microsoft.Management.Infrastructure" Version="3.0.0" /> | ||
| <PackageReference Include="Microsoft.PowerShell.Native" Version="700.0.0-preview.2" /> | ||
| <!-- Removed Wcwidth.Net package reference --> |
Copilot AI Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment about removing Wcwidth.Net package reference doesn't add useful information and creates an orphaned comment line. The comment should either be removed entirely or clarified to explain why the package was removed (e.g., "Native emoji width calculation implemented to replace Wcwidth.Net dependency").
| <!-- Removed Wcwidth.Net package reference --> | |
| <!-- Removed Wcwidth.Net package reference: native emoji width calculation implemented to replace Wcwidth.Net dependency --> |
| // Use StringInfo to enumerate grapheme clusters | ||
| int length = 0; | ||
| for (; offset < str.Length; offset++) | ||
| System.Globalization.StringInfo si = new System.Globalization.StringInfo(str); | ||
| int[] textElementIndexes = System.Globalization.StringInfo.ParseCombiningCharacters(str); | ||
| for (int i = 0; i < textElementIndexes.Length; i++) | ||
| { | ||
| length += CharLengthInBufferCells(str[offset]); | ||
| if (i < offset) continue; | ||
| int start = textElementIndexes[i]; | ||
| int graphemeLength = (i + 1 < textElementIndexes.Length) | ||
| ? textElementIndexes[i + 1] - textElementIndexes[i] | ||
| : str.Length - textElementIndexes[i]; | ||
| string grapheme = str.Substring(start, graphemeLength); | ||
| length += GraphemeLengthInBufferCells(grapheme); | ||
| } | ||
| | ||
| return length; |
Copilot AI Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance concern: Using StringInfo.ParseCombiningCharacters() and Substring() for every call to Length() can be significantly slower than the previous character-by-character iteration, especially for long strings. ParseCombiningCharacters() allocates an array and performs multiple string scans.
For most strings (which don't contain complex grapheme clusters), this is unnecessary overhead. Consider adding a fast path that only uses the grapheme cluster approach when necessary:
// Fast path: check if string likely contains complex graphemes bool needsGraphemeProcessing = str.Any(c => char.IsSurrogate(c) || c >= 0x0300); if (!needsGraphemeProcessing) { // Use simpler character-by-character processing } else { // Use grapheme cluster processing }| /// <summary> | ||
| /// Calculate the buffer cell length of a grapheme cluster (text element). | ||
| /// </summary> | ||
| /// <param name="grapheme">A string representing a single grapheme cluster.</param> | ||
| /// <returns>Number of buffer cells the grapheme needs to take.</returns> | ||
| protected virtual int GraphemeLengthInBufferCells(string grapheme) | ||
| { | ||
| if (string.IsNullOrEmpty(grapheme)) | ||
| return 0; | ||
| | ||
| // Check if the grapheme is an emoji (basic Unicode ranges) | ||
| int codePoint = char.ConvertToUtf32(grapheme, 0); | ||
| // Emoji ranges: Emoticons, Misc Symbols, Dingbats, Transport, etc. | ||
| if ((codePoint >= 0x1F600 && codePoint <= 0x1F64F) || // Emoticons | ||
| (codePoint >= 0x1F300 && codePoint <= 0x1F5FF) || // Misc Symbols & Pictographs | ||
| (codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport & Map | ||
| (codePoint >= 0x2600 && codePoint <= 0x26FF) || // Misc symbols | ||
| (codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats | ||
| (codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols & Pictographs | ||
| (codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols & Pictographs Extended-A | ||
| (codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols | ||
| { | ||
| return 2; | ||
| } | ||
| // Default width for other graphemes | ||
| return 1; | ||
| } |
Copilot AI Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent indentation: This method has excessive indentation (appears to be indented as if it's inside another block). It should align with other methods in the class.
| /// <summary> | |
| /// Calculate the buffer cell length of a grapheme cluster (text element). | |
| /// </summary> | |
| /// <param name="grapheme">A string representing a single grapheme cluster.</param> | |
| /// <returns>Number of buffer cells the grapheme needs to take.</returns> | |
| protected virtual int GraphemeLengthInBufferCells(string grapheme) | |
| { | |
| if (string.IsNullOrEmpty(grapheme)) | |
| return 0; | |
| // Check if the grapheme is an emoji (basic Unicode ranges) | |
| int codePoint = char.ConvertToUtf32(grapheme, 0); | |
| // Emoji ranges: Emoticons, Misc Symbols, Dingbats, Transport, etc. | |
| if ((codePoint >= 0x1F600 && codePoint <= 0x1F64F) || // Emoticons | |
| (codePoint >= 0x1F300 && codePoint <= 0x1F5FF) || // Misc Symbols & Pictographs | |
| (codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport & Map | |
| (codePoint >= 0x2600 && codePoint <= 0x26FF) || // Misc symbols | |
| (codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats | |
| (codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols & Pictographs | |
| (codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols & Pictographs Extended-A | |
| (codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols | |
| { | |
| return 2; | |
| } | |
| // Default width for other graphemes | |
| return 1; | |
| } | |
| /// <summary> | |
| /// Calculate the buffer cell length of a grapheme cluster (text element). | |
| /// </summary> | |
| /// <param name="grapheme">A string representing a single grapheme cluster.</param> | |
| /// <returns>Number of buffer cells the grapheme needs to take.</returns> | |
| protected virtual int GraphemeLengthInBufferCells(string grapheme) | |
| { | |
| if (string.IsNullOrEmpty(grapheme)) | |
| return 0; | |
| // Check if the grapheme is an emoji (basic Unicode ranges) | |
| int codePoint = char.ConvertToUtf32(grapheme, 0); | |
| // Emoji ranges: Emoticons, Misc Symbols, Dingbats, Transport, etc. | |
| if ((codePoint >= 0x1F600 && codePoint <= 0x1F64F) || // Emoticons | |
| (codePoint >= 0x1F300 && codePoint <= 0x1F5FF) || // Misc Symbols & Pictographs | |
| (codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport & Map | |
| (codePoint >= 0x2600 && codePoint <= 0x26FF) || // Misc symbols | |
| (codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats | |
| (codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols & Pictographs | |
| (codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols & Pictographs Extended-A | |
| (codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols | |
| { | |
| return 2; | |
| } | |
| // Default width for other graphemes | |
| return 1; | |
| } |
| // Check if the grapheme is an emoji (basic Unicode ranges) | ||
| int codePoint = char.ConvertToUtf32(grapheme, 0); | ||
| // Emoji ranges: Emoticons, Misc Symbols, Dingbats, Transport, etc. | ||
| if ((codePoint >= 0x1F600 && codePoint <= 0x1F64F) || // Emoticons | ||
| (codePoint >= 0x1F300 && codePoint <= 0x1F5FF) || // Misc Symbols & Pictographs | ||
| (codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport & Map | ||
| (codePoint >= 0x2600 && codePoint <= 0x26FF) || // Misc symbols | ||
| (codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats | ||
| (codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols & Pictographs | ||
| (codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols & Pictographs Extended-A | ||
| (codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols | ||
| { | ||
| return 2; | ||
| } | ||
| // Default width for other graphemes | ||
| return 1; | ||
| } |
Copilot AI Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GraphemeLengthInBufferCells() method only checks the first code point of a grapheme cluster (line 81), but grapheme clusters can contain multiple code points (e.g., base character + combining marks, or emoji with skin tone modifiers). This will give incorrect results for complex graphemes.
For example, "👨👩👧" (family emoji) is a single grapheme but consists of multiple code points joined with zero-width joiners. The current implementation would only check the first code point.
Consider using a more comprehensive approach that accounts for the full grapheme structure, or leveraging existing Unicode width calculation libraries.
| // Check if the grapheme is an emoji (basic Unicode ranges) | |
| int codePoint = char.ConvertToUtf32(grapheme, 0); | |
| // Emoji ranges: Emoticons, Misc Symbols, Dingbats, Transport, etc. | |
| if ((codePoint >= 0x1F600 && codePoint <= 0x1F64F) || // Emoticons | |
| (codePoint >= 0x1F300 && codePoint <= 0x1F5FF) || // Misc Symbols & Pictographs | |
| (codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport & Map | |
| (codePoint >= 0x2600 && codePoint <= 0x26FF) || // Misc symbols | |
| (codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats | |
| (codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols & Pictographs | |
| (codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols & Pictographs Extended-A | |
| (codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols | |
| { | |
| return 2; | |
| } | |
| // Default width for other graphemes | |
| return 1; | |
| } | |
| int width = 0; | |
| for (int i = 0; i < grapheme.Length;) | |
| { | |
| int codePoint = char.ConvertToUtf32(grapheme, i); | |
| // Skip zero-width joiner (U+200D) and combining marks | |
| if (codePoint == 0x200D || CharUnicodeInfo.GetUnicodeCategory(grapheme, i) == UnicodeCategory.NonSpacingMark) | |
| { | |
| i += char.IsSurrogatePair(grapheme, i) ? 2 : 1; | |
| continue; | |
| } | |
| width += CharLengthInBufferCells(char.ConvertFromUtf32(codePoint), codePoint); | |
| i += char.IsSurrogatePair(grapheme, i) ? 2 : 1; | |
| } | |
| return width; | |
| } |
| using System.Management.Automation.Internal; | ||
| using System.Text; | ||
| | ||
| |
Copilot AI Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Unnecessary blank line added. This extra whitespace doesn't serve any purpose and should be removed.
| (codePoint >= 0x1F0A0 && codePoint <= 0x1F0FF) || // Playing Cards | ||
| (codePoint >= 0x1F100 && codePoint <= 0x1F64F) || // Enclosed Alphanumeric Supplement, Emoticons | ||
| (codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport and Map Symbols | ||
| (codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols and Pictographs |
Copilot AI Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overlapping ranges detected: The range 0x1F300-0x1F9FF on line 163 already includes 0x1F900-0x1F9FF (line 168). The duplicate check on line 168 is redundant and should be removed.
| (codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols and Pictographs |
| (codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols and Pictographs Extended-A | ||
| (codePoint >= 0x2600 && codePoint <= 0x26FF) || // Miscellaneous Symbols (includes some emojis) | ||
| (codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats | ||
| (codePoint >= 0x1F170 && codePoint <= 0x1F251) || // Enclosed Alphanumeric Supplement |
Copilot AI Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overlapping ranges detected: The range 0x1F100-0x1F64F on line 166 already includes 0x1F170-0x1F251 (line 173). The duplicate check on line 173 is redundant and should be removed.
| (codePoint >= 0x1F170 && codePoint <= 0x1F251) || // Enclosed Alphanumeric Supplement |
| // Use StringInfo to enumerate grapheme clusters | ||
| int length = 0; | ||
| for (; offset < str.Length; offset++) | ||
| System.Globalization.StringInfo si = new System.Globalization.StringInfo(str); |
Copilot AI Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable si is declared but never used. It can be removed since only ParseCombiningCharacters() is needed here.
| System.Globalization.StringInfo si = new System.Globalization.StringInfo(str); |
| else if (!startFromHead && char.IsLowSurrogate(str[k]) && k - 1 >= kFinal && char.IsHighSurrogate(str[k - 1])) | ||
| { | ||
| // This is a surrogate pair when going backward - skip it since we'll process with the high surrogate | ||
| k = startFromHead ? (k + 1) : (k - 1); |
Copilot AI Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Condition always evaluates to 'false'.
| k = startFromHead ? (k + 1) : (k - 1); | |
| k = k - 1; |
| This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days. |
PR Summary
Fix Format-Table misalignment when displaying emojis by implementing proper Unicode surrogate pair handling and emoji width detection.
PR Context
Fixes #25671 (and likely duplicates #23861, #6290)
Format-Table was incorrectly calculating the display width of emojis, treating them as 1 cell wide when they actually occupy 2 cells on the terminal. This caused misalignment where columns after emoji-containing cells would have extra spacing.
The root cause was that the character width calculation in
DisplayCells.CharLengthInBufferCells()did not:This PR updates the width calculation logic in
ILineOutput.csto:CodePointLengthInBufferCells()method to handle full Unicode code pointsGetFitLength()method to handle surrogate pairs when truncating stringsPR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerFormat-Tablewith emojis results in misaligned table #25671