|
本帖最后由 hbghlyj 于 2023-8-30 16:51 编辑 用这帖的方法查CMAP table
要查的字是"\uD880\uDFC0"
- import fitz
- doc = fitz.open("U30000.pdf")
- page = doc[13]
- font = "BBEGEF+UCS_G_ExtGv1"
- def get_key(xref,key):
- return int(doc.xref_get_key(xref,key)[1].split()[0].replace("[", ""))
- for font_tuple in page.get_fonts():
- if font_tuple[3]==font:
- for line in doc.xref_stream(get_key(font_tuple[0],'ToUnicode')).decode().splitlines():
- if "D880DFC0" in line:print(line)
复制代码
输出- <07d4> <07d4> [<D880DFC0>]
复制代码
用extract_font 提取字体
- import fitz
- doc = fitz.open("U30000.pdf")
- name, ext, _, content = doc.extract_font(40)
- ofile = open(name + "." + ext, "wb")
- ofile.write(content)
- ofile.close()
复制代码 得到BBEGEF+UCS_G_ExtGv1.ttf |
|