I am pretty sure my regexes are fine but they don't work with bash. I crafted them myself using https://unicode.org/charts/. As you will see, they work properly with awk.
Here are the ranges to spare you the need to check them yourself, especially if you don't know Japanese:
hiragana [ぁ-ゟ]
- ぁあぃいぅうぇえぉおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゅゆょよらりるれろゎわゐゑをんゔゕゖ>゙>゚__ゝゞゟ
katakana [゠-ヿㇰ-ㇿ!-○]
- ゠ァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロヮワヰヱヲンヴヵヶヷヸヹヺ・ーヽヾヿ
- ㇰㇱㇲㇳㇴㇵㇶㇷㇸㇹㇺㇻㇼㇽㇾㇿ
- !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⦅⦆。「」、・ヲァィゥェォャュョッーアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワン゙゚ᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑ하ᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ¢£¬ ̄¦¥₩│←↑→↓■○
I also have a regex to find kanjis [一-龥] but this one works as intended in bash.
The >>> wrong! are comments I added to pinpoint where the problems are.
[[ "する" =~ [ぁ-ゟ] ]] && echo 'is hiragana' || echo 'is not hiragana' is hiragana echo 'する' | awk '/[ぁ-ゟ]/ {print "is hiragana"}' is hiragana [[ "スル" =~ [ぁ-ゟ] ]] && echo 'is hiragana' || echo 'is not hiragana' is hiragana >>> wrong! echo 'スル' | awk '/[ぁ-ゟ]/ {print "is hiragana"}' [[ "僕" =~ [ぁ-ゟ] ]] && echo 'is hiragana' || echo 'is not hiragana' is not hiragana echo '僕' | awk '/[ぁ-ゟ]/ {print "is hiragana"}' [[ "する" =~ [゠-ヿㇰ-ㇿ!-○] ]] && echo 'is katakana' || echo 'is not katakana' is katakana >>> wrong! echo 'する' | awk '/[゠-ヿㇰ-ㇿ!-○]/ {print "is katakana"}' [[ "スル" =~ [゠-ヿㇰ-ㇿ!-○] ]] && echo 'is katakana' || echo 'is not katakana' is katakana echo 'スル' | awk '/[゠-ヿㇰ-ㇿ!-○]/ {print "is katakana"}' is katakana [[ "僕" =~ [゠-ヿㇰ-ㇿ!-○] ]] && echo 'is katakana' || echo 'is not katakana' is not katakana echo '僕' | awk '/[゠-ヿㇰ-ㇿ!-○]/ {print "is katakana"}' It's like bash consider hiragana and katakana to be equivalent, like it converts them beforehand or something?
bashversion5.2.15(3)-release (x86_64-pc-cygwin)andLC_CTYPE=en_GB.UTF-8. Unfortunately as I don't know Japanese I can only re-run your tests. My results, with capitalisation for differences:NOT / is / NOT / null / not / null / NOT / null / NOT / is / not / null