Solved: How to recognize Chinese characters in a string？

lehaofeng · Jul 3, 2023 04:11 AM

Hi,I have a problem on regex,

For example:

regex("零件2305wu","[\u4e00-\u9fa5]+")

I want to extract the "零件" but the output is "2305".

Thanks!

lehaofeng · Jul 3, 2023 1:59 AM

It seems to have been found.

regex("零件2305wu","[^\x00-\xff]+")

View solution in original post

lehaofeng · Jul 3, 2023 1:59 AM

It seems to have been found.

regex("零件2305wu","[^\x00-\xff]+")

Craige_Hales · Jul 3, 2023 10:01 AM

Nice! I was going to suggest similar,

regex("零件2305wu","[^\x01-\x7f]+")

The characters from x00 to x7F are ASCII; x00 might make a note in the log, so maybe start at x01. Characters from x80 to xFF are similar to ASCII and you might want to leave them out as well. Outside that range is non-ASCII Unicode.

The [ square brackets ] make a character set, and the leading ^ means not in this set. The minus means a range. The + means one or more.

Craige

lala · May 3, 2024 06:12 AM

tx=regex("零件2305wu","([一-﨩]{0,})");

Is this possible, but reasonable?

Thanks Experts!

Craige_Hales · May 3, 2024 09:31 AM

https://www.google.com/search?q=unicode+%E4%B8%80 (4E00) is smaller than

https://www.google.com/search?q=unicode+%EF%A8%A9 (FA29) so it should be a valid range. It matches about 11/16 of the Unicode characters.

{0,} means zero or more, just like *

it appears to work, keeping the first two characters and rejecting the last 6 characters.

Craige

How to recognize Chinese characters in a string？

Re: How to recognize Chinese characters in a string？

Re: How to recognize Chinese characters in a string？

Re: How to recognize Chinese characters in a string？

Re: How to recognize Chinese characters in a string？

Re: How to recognize Chinese characters in a string？