[Python]Add delimiters around consecutive non-Han characters or whitespace

hbghlyj · 2023-2-25 10:53

import re
def add_math_delimiters(text):
pattern = re.compile(r"(?<=\w)(?=[^\u4e00-\u9fff\s])|(?<=[^\u4e00-\u9fff\s])(?=\w)")
# \w matches any alphanumeric character (including underscore)
# \u4e00-\u9fff is the range of Unicode characters for Han characters
# \s matches any whitespace character
return pattern.sub(r"\(\g<0>\)", text)

Copy the Code

This function uses regular expressions to find consecutive non-Han characters or whitespace that are adjacent to alphanumeric characters (including underscores) and adds math delimiters \( and \) around them. The regular expression pattern matches such sequences using lookahead and lookbehind assertions.

Here's an example of how to use this function:

text = "我们从一个以点 O 为圆心的圆及圆上一点 N 开始，来作出一族以圆 (O ) 为外接圆，并以点 N 为九点圆心的三角形."
result = add_math_delimiters(text)
print(result)

Copy the Code

This will output:

我们从一个以点 O 为圆心的圆及圆上一点 N 开始，来作出一族以圆 (\(O\)) 为外接圆，并以点 N 为九点圆心的三角形.

Copy the Code

Note that the non-Han characters "O" and ")" have been surrounded by math delimiters \( and \) respectively.

Account		Remember me	Forgot password
Password			Register account

[Python]Add delimiters around consecutive non-Han characters or whitespace

Quick Reply