Revisions to How can I convert full-width characters to half-width characters (and vice versa)?

added 124 characters in body

edited Mar 1, 2024 at 11:56

587.9k
96
1.1k
1.7k

On some systems, you may be able to do the same with tr or sed if in the C.UTF-8 locale, not GNU ones AFAICTnot with current versions of GNU tr unless patched by your OS vendor.

$ uname FreeBSD $ echo 'Ａb９８７６５４３２１０' | LC_ALL=C.UTF-8 tr $'\u3000\uff01-\uff5e' ' !-~' Ab9876543210

(also assuming a shell with support for zsh's $'\uXXXX').

For the reverse conversion, just change the y/from/to/ to y/to/from/.

On some systems, you may be able to do the same with tr or sed if in the C.UTF-8 locale, not GNU ones AFAICT.

For the reverse conversion, just change the y/from/to/ to y/to/from.

On some systems, you may be able to do the same with tr if in the C.UTF-8 locale, not with current versions of GNU tr unless patched by your OS vendor.

$ uname FreeBSD $ echo 'Ａb９８７６５４３２１０' | LC_ALL=C.UTF-8 tr $'\u3000\uff01-\uff5e' ' !-~' Ab9876543210

(also assuming a shell with support for zsh's $'\uXXXX').

For the reverse conversion, just change the y/from/to/ to y/to/from/.

added 224 characters in body

Source Link

edited Mar 1, 2024 at 11:35

Stéphane Chazelas

587.9k
96
1.1k
1.7k

Or using the NFKD decomposition for those characters that have a wide decomposition type:

perl -MUnicode::Normalize=NFKD -C -pe 's/\p{Dt=Wide}/NFKD$&/ge'

If it's to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)

Or using the NFKD decomposition for those characters that have a wide decomposition type:

perl -MUnicode::Normalize=NFKD -C -pe 's/\p{Dt=Wide}/NFKD$&/ge'

If it's to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)

added 401 characters in body

Source Link

edited Mar 1, 2024 at 10:37

Stéphane Chazelas

587.9k
96
1.1k
1.7k

perl also has an interface to the Unicode data in its Unicode::UCD module, so you could also do:

perl -C -MUnicode::UCD=charprop -pe ' s{\p{Decomposition_Type: Wide}}{ $cache{$&} //= charprop(ord($&), "Decomposition_Mapping") }ge'

Though it's quite slow even if mitigated here by the use of caching. See perldoc perluniprops and perldoc Unicode::UCD for details.

If it's to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)

perl also has an interface to the Unicode data in its Unicode::UCD module, so you could also do:

perl -C -MUnicode::UCD=charprop -pe ' s{\p{Decomposition_Type: Wide}}{ $cache{$&} //= charprop(ord($&), "Decomposition_Mapping") }ge'

Though it's quite slow even if mitigated here by the use of caching. See perldoc perluniprops and perldoc Unicode::UCD for details.

If it's to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)