|
- import re
- def add_math_delimiters(text):
- pattern = re.compile(r"(?<=\w)(?=[^\u4e00-\u9fff\s])|(?<=[^\u4e00-\u9fff\s])(?=\w)")
- # \w matches any alphanumeric character (including underscore)
- # \u4e00-\u9fff is the range of Unicode characters for Han characters
- # \s matches any whitespace character
- return pattern.sub(r"\(\g<0>\)", text)
复制代码
This function uses regular expressions to find consecutive non-Han characters or whitespace that are adjacent to alphanumeric characters (including underscores) and adds math delimiters \( and \) around them. The regular expression pattern matches such sequences using lookahead and lookbehind assertions.
Here's an example of how to use this function:
- text = "我们从一个以点 O 为圆心的圆及圆上一点 N 开始,来作出一族以圆 (O ) 为外接圆,并以点 N 为九点圆心的三角形."
- result = add_math_delimiters(text)
- print(result)
复制代码
This will output:
- 我们从一个以点 O 为圆心的圆及圆上一点 N 开始,来作出一族以圆 (\(O\)) 为外接圆,并以点 N 为九点圆心的三角形.
复制代码
Note that the non-Han characters "O" and ")" have been surrounded by math delimiters \( and \) respectively. |
|