Forgot password?
 Create new account
View 90|Reply 0

[Python]Add delimiters around consecutive non-Han characters or whitespace

[Copy link]

3147

Threads

8493

Posts

610K

Credits

Credits
66163
QQ

Show all posts

hbghlyj Posted at 2023-2-25 10:53:28 |Read mode
  1. import re
  2. def add_math_delimiters(text):
  3.     pattern = re.compile(r"(?<=\w)(?=[^\u4e00-\u9fff\s])|(?<=[^\u4e00-\u9fff\s])(?=\w)")
  4.     # \w matches any alphanumeric character (including underscore)
  5.     # \u4e00-\u9fff is the range of Unicode characters for Han characters
  6.     # \s matches any whitespace character
  7.     return pattern.sub(r"\(\g<0>\)", text)
Copy the Code

This function uses regular expressions to find consecutive non-Han characters or whitespace that are adjacent to alphanumeric characters (including underscores) and adds math delimiters \( and \) around them. The regular expression pattern matches such sequences using lookahead and lookbehind assertions.

Here's an example of how to use this function:
  1. text = "我们从一个以点 O 为圆心的圆及圆上一点 N 开始,来作出一族以圆 (O ) 为外接圆,并以点 N 为九点圆心的三角形."
  2. result = add_math_delimiters(text)
  3. print(result)
Copy the Code

This will output:
  1. 我们从一个以点 O 为圆心的圆及圆上一点 N 开始,来作出一族以圆 (\(O\)) 为外接圆,并以点 N 为九点圆心的三角形.
Copy the Code

Note that the non-Han characters "O" and ")" have been surrounded by math delimiters \( and \) respectively.

手机版Mobile version|Leisure Math Forum

2025-4-20 22:16 GMT+8

Powered by Discuz!

× Quick Reply To Top Return to the list