找回密码
 快速注册
搜索
查看: 31|回复: 0

[Python]Add delimiters around consecutive non-Han characters or whitespace

[复制链接]

3149

主题

8386

回帖

6万

积分

$\style{scale:11;fill:#eff}꩜$

积分
65391
QQ

显示全部楼层

hbghlyj 发表于 2023-2-25 10:53 |阅读模式
  1. import re
  2. def add_math_delimiters(text):
  3.     pattern = re.compile(r"(?<=\w)(?=[^\u4e00-\u9fff\s])|(?<=[^\u4e00-\u9fff\s])(?=\w)")
  4.     # \w matches any alphanumeric character (including underscore)
  5.     # \u4e00-\u9fff is the range of Unicode characters for Han characters
  6.     # \s matches any whitespace character
  7.     return pattern.sub(r"\(\g<0>\)", text)
复制代码

This function uses regular expressions to find consecutive non-Han characters or whitespace that are adjacent to alphanumeric characters (including underscores) and adds math delimiters \( and \) around them. The regular expression pattern matches such sequences using lookahead and lookbehind assertions.

Here's an example of how to use this function:
  1. text = "我们从一个以点 O 为圆心的圆及圆上一点 N 开始,来作出一族以圆 (O ) 为外接圆,并以点 N 为九点圆心的三角形."
  2. result = add_math_delimiters(text)
  3. print(result)
复制代码

This will output:
  1. 我们从一个以点 O 为圆心的圆及圆上一点 N 开始,来作出一族以圆 (\(O\)) 为外接圆,并以点 N 为九点圆心的三角形.
复制代码

Note that the non-Han characters "O" and ")" have been surrounded by math delimiters \( and \) respectively.

手机版|悠闲数学娱乐论坛(第3版)

GMT+8, 2025-3-4 15:43

Powered by Discuz!

× 快速回复 返回顶部 返回列表