Skip to main content
added 124 characters in body
Source Link
Stéphane Chazelas
  • 587.9k
  • 96
  • 1.1k
  • 1.7k

On some systems, you may be able to do the same with tr or sed if in the C.UTF-8 locale, not GNU ones AFAICTnot with current versions of GNU tr unless patched by your OS vendor.

$ uname FreeBSD $ echo 'Ab9876543210' | LC_ALL=C.UTF-8 tr $'\u3000\uff01-\uff5e' ' !-~' Ab9876543210 

(also assuming a shell with support for zsh's $'\uXXXX').

For the reverse conversion, just change the y/from/to/ to y/to/from/.

On some systems, you may be able to do the same with tr or sed if in the C.UTF-8 locale, not GNU ones AFAICT.

For the reverse conversion, just change the y/from/to/ to y/to/from.

On some systems, you may be able to do the same with tr if in the C.UTF-8 locale, not with current versions of GNU tr unless patched by your OS vendor.

$ uname FreeBSD $ echo 'Ab9876543210' | LC_ALL=C.UTF-8 tr $'\u3000\uff01-\uff5e' ' !-~' Ab9876543210 

(also assuming a shell with support for zsh's $'\uXXXX').

For the reverse conversion, just change the y/from/to/ to y/to/from/.

added 224 characters in body
Source Link
Stéphane Chazelas
  • 587.9k
  • 96
  • 1.1k
  • 1.7k

Or using the NFKD decomposition for those characters that have a wide decomposition type:

perl -MUnicode::Normalize=NFKD -C -pe 's/\p{Dt=Wide}/NFKD$&/ge' 

If it's to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)

If it's to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)

Or using the NFKD decomposition for those characters that have a wide decomposition type:

perl -MUnicode::Normalize=NFKD -C -pe 's/\p{Dt=Wide}/NFKD$&/ge' 

If it's to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)

added 401 characters in body
Source Link
Stéphane Chazelas
  • 587.9k
  • 96
  • 1.1k
  • 1.7k

perl also has an interface to the Unicode data in its Unicode::UCD module, so you could also do:

perl -C -MUnicode::UCD=charprop -pe ' s{\p{Decomposition_Type: Wide}}{ $cache{$&} //= charprop(ord($&), "Decomposition_Mapping") }ge' 

Though it's quite slow even if mitigated here by the use of caching. See perldoc perluniprops and perldoc Unicode::UCD for details.

If it's to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)

If it's to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)

perl also has an interface to the Unicode data in its Unicode::UCD module, so you could also do:

perl -C -MUnicode::UCD=charprop -pe ' s{\p{Decomposition_Type: Wide}}{ $cache{$&} //= charprop(ord($&), "Decomposition_Mapping") }ge' 

Though it's quite slow even if mitigated here by the use of caching. See perldoc perluniprops and perldoc Unicode::UCD for details.

If it's to convert to ASCII, on GNU systems at least, iconv -t ASCII//translit would also convert those (and more characters to their closest ASCII character¹ representation)

added 183 characters in body
Source Link
Stéphane Chazelas
  • 587.9k
  • 96
  • 1.1k
  • 1.7k
Loading
added 183 characters in body
Source Link
Stéphane Chazelas
  • 587.9k
  • 96
  • 1.1k
  • 1.7k
Loading
added 220 characters in body
Source Link
Stéphane Chazelas
  • 587.9k
  • 96
  • 1.1k
  • 1.7k
Loading
added 460 characters in body
Source Link
Stéphane Chazelas
  • 587.9k
  • 96
  • 1.1k
  • 1.7k
Loading
added 144 characters in body
Source Link
Stéphane Chazelas
  • 587.9k
  • 96
  • 1.1k
  • 1.7k
Loading
added 339 characters in body
Source Link
Stéphane Chazelas
  • 587.9k
  • 96
  • 1.1k
  • 1.7k
Loading
Source Link
Stéphane Chazelas
  • 587.9k
  • 96
  • 1.1k
  • 1.7k
Loading