2

I am pretty sure my regexes are fine but they don't work with bash. I crafted them myself using https://unicode.org/charts/. As you will see, they work properly with awk.

Here are the ranges to spare you the need to check them yourself, especially if you don't know Japanese:

  • hiragana [ぁ-ゟ]

    • ぁあぃいぅうぇえぉおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゅゆょよらりるれろゎわゐゑをんゔゕゖ>゙>゚__ゝゞゟ
  • katakana [゠-ヿㇰ-ㇿ!-○]

    • ゠ァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロヮワヰヱヲンヴヵヶヷヸヹヺ・ーヽヾヿ
    • ㇰㇱㇲㇳㇴㇵㇶㇷㇸㇹㇺㇻㇼㇽㇾㇿ
    • !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⦅⦆。「」、・ヲァィゥェォャュョッーアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワン゙゚ᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑ하ᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ¢£¬ ̄¦¥₩│←↑→↓■○

I also have a regex to find kanjis [一-龥] but this one works as intended in bash.
The >>> wrong! are comments I added to pinpoint where the problems are.

[[ "する" =~ [ぁ-ゟ] ]] && echo 'is hiragana' || echo 'is not hiragana' is hiragana echo 'する' | awk '/[ぁ-ゟ]/ {print "is hiragana"}' is hiragana [[ "スル" =~ [ぁ-ゟ] ]] && echo 'is hiragana' || echo 'is not hiragana' is hiragana >>> wrong! echo 'スル' | awk '/[ぁ-ゟ]/ {print "is hiragana"}' [[ "僕" =~ [ぁ-ゟ] ]] && echo 'is hiragana' || echo 'is not hiragana' is not hiragana echo '僕' | awk '/[ぁ-ゟ]/ {print "is hiragana"}' [[ "する" =~ [゠-ヿㇰ-ㇿ!-○] ]] && echo 'is katakana' || echo 'is not katakana' is katakana >>> wrong! echo 'する' | awk '/[゠-ヿㇰ-ㇿ!-○]/ {print "is katakana"}' [[ "スル" =~ [゠-ヿㇰ-ㇿ!-○] ]] && echo 'is katakana' || echo 'is not katakana' is katakana echo 'スル' | awk '/[゠-ヿㇰ-ㇿ!-○]/ {print "is katakana"}' is katakana [[ "僕" =~ [゠-ヿㇰ-ㇿ!-○] ]] && echo 'is katakana' || echo 'is not katakana' is not katakana echo '僕' | awk '/[゠-ヿㇰ-ㇿ!-○]/ {print "is katakana"}' 

It's like bash consider hiragana and katakana to be equivalent, like it converts them beforehand or something?

4
  • I use Kitty, bash version is "5.2.2(1)-release (x86_64-pc-linux-gnu)", and LC_ALL is not set but the rest is utf8. I set LC_ALL as en_US.UTF-8 but it still is not working. As for your second message, you are telling me you have the opposite of my results?? Commented Apr 15, 2023 at 16:23
  • Yeah, because my locale is set up in a strange way by default. Sorry for the confusion. Commented Apr 15, 2023 at 16:27
  • I guess that bash regex (POSIX regex) is simply not suitable for Unicode at all. Commented Apr 15, 2023 at 16:35
  • I'm getting significantly different results with bash version 5.2.15(3)-release (x86_64-pc-cygwin) and LC_CTYPE=en_GB.UTF-8. Unfortunately as I don't know Japanese I can only re-run your tests. My results, with capitalisation for differences: NOT / is / NOT / null / not / null / NOT / null / NOT / is / not / null Commented Dec 31, 2023 at 12:41

1 Answer 1

0

There is the script you want:

string1="する" string2="スル" if echo "$string1" | grep -P '[ぁ-ゟ]' >/dev/null; then echo 'is hiragana' else echo 'is not hiragana' fi if echo "$string2" | grep -P '[ぁ-ゟ]' >/dev/null; then echo 'is hiragana' else echo 'is not hiragana' fi if echo "$string1" | grep -P '[゠-ヿㇰ-ㇿ!-○]' >/dev/null; then echo 'is katakana' else echo 'is not katakana' fi if echo "$string2" | grep -P '[゠-ヿㇰ-ㇿ!-○]' >/dev/null; then echo 'is katakana' else echo 'is not katakana' fi 
4
  • Use grep -q instead of grep > /dev/null Commented Dec 31, 2023 at 5:49
  • (1) So you're saying that it can't be done in pure bash? If so, you should say so. (2) And do you have a reference or source for saying that it can't be done in pure bash, or are you just offering a grep solution that works? (3) Why do you believe that you need to use -P? (4) Why are you not using here strings? (5) printf "%s" would be better thanecho. (6) Rather than repeating code with different data, consider defining functions. Commented Dec 31, 2023 at 6:00
  • Also (7) consider using grep -q instead of redirecting output to /dev/null. Commented Dec 31, 2023 at 6:00
  • @g-man-says-reinstate-monica If you don't liked my code, why don't you posted your own to the OP and is complaining about what is already done? Commented May 19, 2024 at 3:41

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.