Skip to content

Conversation

@menochs1
Copy link

@menochs1 menochs1 commented Dec 3, 2025

PR Summary

Fix Format-Table misalignment when displaying emojis by implementing proper Unicode surrogate pair handling and emoji width detection.

PR Context

Fixes #25671 (and likely duplicates #23861, #6290)

Format-Table was incorrectly calculating the display width of emojis, treating them as 1 cell wide when they actually occupy 2 cells on the terminal. This caused misalignment where columns after emoji-containing cells would have extra spacing.

The root cause was that the character width calculation in DisplayCells.CharLengthInBufferCells() did not:

  1. Handle UTF-16 surrogate pairs (used by most emojis outside the Basic Multilingual Plane)
  2. Recognize emoji characters in the BMP (U+2600-U+26FF range) as wide characters

This PR updates the width calculation logic in ILineOutput.cs to:

  • Detect and properly process surrogate pairs as single code points
  • Add a new CodePointLengthInBufferCells() method to handle full Unicode code points
  • Recognize common emoji ranges (Miscellaneous Symbols, Dingbats, etc.) as 2-cell wide
  • Update the GetFitLength() method to handle surrogate pairs when truncating strings

PR Checklist

Copilot AI review requested due to automatic review settings December 3, 2025 04:27
@menochs1 menochs1 requested review from a team and jshigetomi as code owners December 3, 2025 04:27
@menochs1
Copy link
Author

menochs1 commented Dec 3, 2025

@microsoft-github-policy-service agree

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a long-standing Format-Table misalignment issue when displaying emoji characters by implementing proper Unicode surrogate pair handling and emoji width detection in the DisplayCells class. The changes replace reliance on an external package with native emoji width calculation logic that recognizes emojis as occupying 2 terminal cells instead of 1.

Key Changes:

  • Refactored Length() method to use grapheme cluster enumeration via StringInfo.ParseCombiningCharacters() instead of character-by-character iteration
  • Added CodePointLengthInBufferCells() and GraphemeLengthInBufferCells() helper methods to calculate display widths for full Unicode code points and grapheme clusters
  • Updated CharLengthInBufferCells() to recognize BMP emoji ranges (U+2600-U+26FF, U+2700-U+27BF, etc.) as wide characters
  • Enhanced GetFitLength() to properly handle surrogate pairs during string truncation in both forward and backward directions

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 16 comments.

File Description
test_emoji.cs Standalone test/validation file demonstrating emoji width calculations (should be moved to proper test directory or removed)
src/System.Management.Automation/System.Management.Automation.csproj Removed Wcwidth.Net package dependency with comment explaining the removal
src/System.Management.Automation/FormatAndOutput/common/ILineOutput.cs Core implementation: added grapheme cluster processing, surrogate pair handling, and comprehensive emoji range detection for proper width calculation
Comments suppressed due to low confidence (1)

src/System.Management.Automation/FormatAndOutput/common/ILineOutput.cs:56

  • This assignment to si is useless, since its value is never read.
 System.Globalization.StringInfo si = new System.Globalization.StringInfo(str); 

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

(codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols
{
return 2;
}
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GraphemeLengthInBufferCells() method doesn't handle wide CJK characters properly. It only checks for emoji ranges but ignores the existing CJK width logic in CharLengthInBufferCells(). A grapheme containing CJK characters should still be calculated as 2 cells wide, but this implementation defaults to returning 1 (line 95).

Consider falling back to CharLengthInBufferCells() for BMP characters that aren't emojis, similar to how CodePointLengthInBufferCells() does on lines 182-184.

Suggested change
}
}
// For BMP characters, use CharLengthInBufferCells to handle CJK/wide logic
if (codePoint <= 0xFFFF)
return CharLengthInBufferCells((char)codePoint);
Copilot uses AI. Check for mistakes.
<!-- the following package(s) are from the powershell org -->
<PackageReference Include="Microsoft.Management.Infrastructure" Version="3.0.0" />
<PackageReference Include="Microsoft.PowerShell.Native" Version="700.0.0-preview.2" />
<!-- Removed Wcwidth.Net package reference -->
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment about removing Wcwidth.Net package reference doesn't add useful information and creates an orphaned comment line. The comment should either be removed entirely or clarified to explain why the package was removed (e.g., "Native emoji width calculation implemented to replace Wcwidth.Net dependency").

Suggested change
<!-- Removed Wcwidth.Net package reference -->
<!-- Removed Wcwidth.Net package reference: native emoji width calculation implemented to replace Wcwidth.Net dependency -->
Copilot uses AI. Check for mistakes.
Comment on lines +54 to 68
// Use StringInfo to enumerate grapheme clusters
int length = 0;
for (; offset < str.Length; offset++)
System.Globalization.StringInfo si = new System.Globalization.StringInfo(str);
int[] textElementIndexes = System.Globalization.StringInfo.ParseCombiningCharacters(str);
for (int i = 0; i < textElementIndexes.Length; i++)
{
length += CharLengthInBufferCells(str[offset]);
if (i < offset) continue;
int start = textElementIndexes[i];
int graphemeLength = (i + 1 < textElementIndexes.Length)
? textElementIndexes[i + 1] - textElementIndexes[i]
: str.Length - textElementIndexes[i];
string grapheme = str.Substring(start, graphemeLength);
length += GraphemeLengthInBufferCells(grapheme);
}

return length;
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance concern: Using StringInfo.ParseCombiningCharacters() and Substring() for every call to Length() can be significantly slower than the previous character-by-character iteration, especially for long strings. ParseCombiningCharacters() allocates an array and performs multiple string scans.

For most strings (which don't contain complex grapheme clusters), this is unnecessary overhead. Consider adding a fast path that only uses the grapheme cluster approach when necessary:

// Fast path: check if string likely contains complex graphemes bool needsGraphemeProcessing = str.Any(c => char.IsSurrogate(c) || c >= 0x0300); if (!needsGraphemeProcessing) { // Use simpler character-by-character processing } else { // Use grapheme cluster processing }
Copilot uses AI. Check for mistakes.
Comment on lines +70 to +96
/// <summary>
/// Calculate the buffer cell length of a grapheme cluster (text element).
/// </summary>
/// <param name="grapheme">A string representing a single grapheme cluster.</param>
/// <returns>Number of buffer cells the grapheme needs to take.</returns>
protected virtual int GraphemeLengthInBufferCells(string grapheme)
{
if (string.IsNullOrEmpty(grapheme))
return 0;

// Check if the grapheme is an emoji (basic Unicode ranges)
int codePoint = char.ConvertToUtf32(grapheme, 0);
// Emoji ranges: Emoticons, Misc Symbols, Dingbats, Transport, etc.
if ((codePoint >= 0x1F600 && codePoint <= 0x1F64F) || // Emoticons
(codePoint >= 0x1F300 && codePoint <= 0x1F5FF) || // Misc Symbols & Pictographs
(codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport & Map
(codePoint >= 0x2600 && codePoint <= 0x26FF) || // Misc symbols
(codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats
(codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols & Pictographs
(codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols & Pictographs Extended-A
(codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols
{
return 2;
}
// Default width for other graphemes
return 1;
}
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation: This method has excessive indentation (appears to be indented as if it's inside another block). It should align with other methods in the class.

Suggested change
/// <summary>
/// Calculate the buffer cell length of a grapheme cluster (text element).
/// </summary>
/// <param name="grapheme">A string representing a single grapheme cluster.</param>
/// <returns>Number of buffer cells the grapheme needs to take.</returns>
protected virtual int GraphemeLengthInBufferCells(string grapheme)
{
if (string.IsNullOrEmpty(grapheme))
return 0;
// Check if the grapheme is an emoji (basic Unicode ranges)
int codePoint = char.ConvertToUtf32(grapheme, 0);
// Emoji ranges: Emoticons, Misc Symbols, Dingbats, Transport, etc.
if ((codePoint >= 0x1F600 && codePoint <= 0x1F64F) || // Emoticons
(codePoint >= 0x1F300 && codePoint <= 0x1F5FF) || // Misc Symbols & Pictographs
(codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport & Map
(codePoint >= 0x2600 && codePoint <= 0x26FF) || // Misc symbols
(codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats
(codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols & Pictographs
(codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols & Pictographs Extended-A
(codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols
{
return 2;
}
// Default width for other graphemes
return 1;
}
/// <summary>
/// Calculate the buffer cell length of a grapheme cluster (text element).
/// </summary>
/// <param name="grapheme">A string representing a single grapheme cluster.</param>
/// <returns>Number of buffer cells the grapheme needs to take.</returns>
protected virtual int GraphemeLengthInBufferCells(string grapheme)
{
if (string.IsNullOrEmpty(grapheme))
return 0;
// Check if the grapheme is an emoji (basic Unicode ranges)
int codePoint = char.ConvertToUtf32(grapheme, 0);
// Emoji ranges: Emoticons, Misc Symbols, Dingbats, Transport, etc.
if ((codePoint >= 0x1F600 && codePoint <= 0x1F64F) || // Emoticons
(codePoint >= 0x1F300 && codePoint <= 0x1F5FF) || // Misc Symbols & Pictographs
(codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport & Map
(codePoint >= 0x2600 && codePoint <= 0x26FF) || // Misc symbols
(codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats
(codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols & Pictographs
(codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols & Pictographs Extended-A
(codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols
{
return 2;
}
// Default width for other graphemes
return 1;
}
Copilot uses AI. Check for mistakes.
Comment on lines +80 to +96
// Check if the grapheme is an emoji (basic Unicode ranges)
int codePoint = char.ConvertToUtf32(grapheme, 0);
// Emoji ranges: Emoticons, Misc Symbols, Dingbats, Transport, etc.
if ((codePoint >= 0x1F600 && codePoint <= 0x1F64F) || // Emoticons
(codePoint >= 0x1F300 && codePoint <= 0x1F5FF) || // Misc Symbols & Pictographs
(codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport & Map
(codePoint >= 0x2600 && codePoint <= 0x26FF) || // Misc symbols
(codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats
(codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols & Pictographs
(codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols & Pictographs Extended-A
(codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols
{
return 2;
}
// Default width for other graphemes
return 1;
}
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GraphemeLengthInBufferCells() method only checks the first code point of a grapheme cluster (line 81), but grapheme clusters can contain multiple code points (e.g., base character + combining marks, or emoji with skin tone modifiers). This will give incorrect results for complex graphemes.

For example, "👨‍👩‍👧" (family emoji) is a single grapheme but consists of multiple code points joined with zero-width joiners. The current implementation would only check the first code point.

Consider using a more comprehensive approach that accounts for the full grapheme structure, or leveraging existing Unicode width calculation libraries.

Suggested change
// Check if the grapheme is an emoji (basic Unicode ranges)
int codePoint = char.ConvertToUtf32(grapheme, 0);
// Emoji ranges: Emoticons, Misc Symbols, Dingbats, Transport, etc.
if ((codePoint >= 0x1F600 && codePoint <= 0x1F64F) || // Emoticons
(codePoint >= 0x1F300 && codePoint <= 0x1F5FF) || // Misc Symbols & Pictographs
(codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport & Map
(codePoint >= 0x2600 && codePoint <= 0x26FF) || // Misc symbols
(codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats
(codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols & Pictographs
(codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols & Pictographs Extended-A
(codePoint >= 0x1F1E6 && codePoint <= 0x1F1FF)) // Regional Indicator Symbols
{
return 2;
}
// Default width for other graphemes
return 1;
}
int width = 0;
for (int i = 0; i < grapheme.Length;)
{
int codePoint = char.ConvertToUtf32(grapheme, i);
// Skip zero-width joiner (U+200D) and combining marks
if (codePoint == 0x200D || CharUnicodeInfo.GetUnicodeCategory(grapheme, i) == UnicodeCategory.NonSpacingMark)
{
i += char.IsSurrogatePair(grapheme, i) ? 2 : 1;
continue;
}
width += CharLengthInBufferCells(char.ConvertFromUtf32(codePoint), codePoint);
i += char.IsSurrogatePair(grapheme, i) ? 2 : 1;
}
return width;
}
Copilot uses AI. Check for mistakes.
using System.Management.Automation.Internal;
using System.Text;


Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Unnecessary blank line added. This extra whitespace doesn't serve any purpose and should be removed.

Suggested change
Copilot uses AI. Check for mistakes.
(codePoint >= 0x1F0A0 && codePoint <= 0x1F0FF) || // Playing Cards
(codePoint >= 0x1F100 && codePoint <= 0x1F64F) || // Enclosed Alphanumeric Supplement, Emoticons
(codePoint >= 0x1F680 && codePoint <= 0x1F6FF) || // Transport and Map Symbols
(codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols and Pictographs
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlapping ranges detected: The range 0x1F300-0x1F9FF on line 163 already includes 0x1F900-0x1F9FF (line 168). The duplicate check on line 168 is redundant and should be removed.

Suggested change
(codePoint >= 0x1F900 && codePoint <= 0x1F9FF) || // Supplemental Symbols and Pictographs
Copilot uses AI. Check for mistakes.
(codePoint >= 0x1FA70 && codePoint <= 0x1FAFF) || // Symbols and Pictographs Extended-A
(codePoint >= 0x2600 && codePoint <= 0x26FF) || // Miscellaneous Symbols (includes some emojis)
(codePoint >= 0x2700 && codePoint <= 0x27BF) || // Dingbats
(codePoint >= 0x1F170 && codePoint <= 0x1F251) || // Enclosed Alphanumeric Supplement
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlapping ranges detected: The range 0x1F100-0x1F64F on line 166 already includes 0x1F170-0x1F251 (line 173). The duplicate check on line 173 is redundant and should be removed.

Suggested change
(codePoint >= 0x1F170 && codePoint <= 0x1F251) || // Enclosed Alphanumeric Supplement
Copilot uses AI. Check for mistakes.
// Use StringInfo to enumerate grapheme clusters
int length = 0;
for (; offset < str.Length; offset++)
System.Globalization.StringInfo si = new System.Globalization.StringInfo(str);
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable si is declared but never used. It can be removed since only ParseCombiningCharacters() is needed here.

Suggested change
System.Globalization.StringInfo si = new System.Globalization.StringInfo(str);
Copilot uses AI. Check for mistakes.
else if (!startFromHead && char.IsLowSurrogate(str[k]) && k - 1 >= kFinal && char.IsHighSurrogate(str[k - 1]))
{
// This is a surrogate pair when going backward - skip it since we'll process with the high surrogate
k = startFromHead ? (k + 1) : (k - 1);
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition always evaluates to 'false'.

Suggested change
k = startFromHead ? (k + 1) : (k - 1);
k = k - 1;
Copilot uses AI. Check for mistakes.
@microsoft-github-policy-service microsoft-github-policy-service bot added the Review - Needed The PR is being reviewed label Dec 10, 2025
@microsoft-github-policy-service
Copy link
Contributor

This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days.
Maintainer, please provide feedback and/or mark it as Waiting on Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review - Needed The PR is being reviewed

3 participants