Forgot password?
 Create new account
View 118|Reply 0

提取pdf中的图片

[Copy link]

3146

Threads

8493

Posts

610K

Credits

Credits
66158
QQ

Show all posts

hbghlyj Posted at 2022-8-31 05:01:12 |Read mode
如何以原始分辨率和格式从 pdf 文档中提取所有图像? (意味着将 tiff 提取为 tiff,将 jpeg 提取为 jpeg 等,不要重新采样)。不管源图像在页面上的位置。

Solution in Python
You can use the module PyMuPDF. This outputs all images as .png files, but worked out of the box and is fast.
from PIL import Image

from PyPDF2 import PdfReader


def extract_image(pdf_file_path):
    reader = PdfReader(pdf_file_path)
    page = reader.pages[0]
    x_object = page["/Resources"]["/XObject"].getObject()

    for obj in x_object:
        if x_object[obj]["/Subtype"] == "/Image":
            size = (x_object[obj]["/Width"], x_object[obj]["/Height"])
            data = x_object[obj].getData()
            if x_object[obj]["/ColorSpace"] == "/DeviceRGB":
                mode = "RGB"
            else:
                mode = "P"

            if x_object[obj]["/Filter"] == "/FlateDecode":
                img = Image.frombytes(mode, size, data)
                img.save(obj[1:] + ".png")
            elif x_object[obj]["/Filter"] == "/DCTDecode":
                img = open(obj[1:] + ".jpg", "wb")
                img.write(data)
                img.close()
            elif x_object[obj]["/Filter"] == "/JPXDecode":
                img = open(obj[1:] + ".jp2", "wb")
                img.write(data)
                img.close()

手机版Mobile version|Leisure Math Forum

2025-4-20 21:58 GMT+8

Powered by Discuz!

× Quick Reply To Top Return to the list