Forgot password
 Register account
View 164|Reply 0

提取pdf中的图片

[Copy link]

3208

Threads

7846

Posts

51

Reputation

Show all posts

hbghlyj posted 2022-8-31 05:01 |Read mode
如何以原始分辨率和格式从 pdf 文档中提取所有图像? (意味着将 tiff 提取为 tiff,将 jpeg 提取为 jpeg 等,不要重新采样)。不管源图像在页面上的位置。

Solution in Python
You can use the module PyMuPDF. This outputs all images as .png files, but worked out of the box and is fast.
from PIL import Image

from PyPDF2 import PdfReader


def extract_image(pdf_file_path):
    reader = PdfReader(pdf_file_path)
    page = reader.pages[0]
    x_object = page["/Resources"]["/XObject"].getObject()

    for obj in x_object:
        if x_object[obj]["/Subtype"] == "/Image":
            size = (x_object[obj]["/Width"], x_object[obj]["/Height"])
            data = x_object[obj].getData()
            if x_object[obj]["/ColorSpace"] == "/DeviceRGB":
                mode = "RGB"
            else:
                mode = "P"

            if x_object[obj]["/Filter"] == "/FlateDecode":
                img = Image.frombytes(mode, size, data)
                img.save(obj[1:] + ".png")
            elif x_object[obj]["/Filter"] == "/DCTDecode":
                img = open(obj[1:] + ".jpg", "wb")
                img.write(data)
                img.close()
            elif x_object[obj]["/Filter"] == "/JPXDecode":
                img = open(obj[1:] + ".jp2", "wb")
                img.write(data)
                img.close()

Quick Reply

Advanced Mode
B Color Image Link Quote Code Smilies
You have to log in before you can reply Login | Register account

$\LaTeX$ formula tutorial

Mobile version

2025-7-12 15:22 GMT+8

Powered by Discuz!

Processed in 0.020794 seconds, 43 queries