Have you ever needed to pull out all the images from a PDF and combine them into one long image for easy viewing or sharing? Whether it’s for study notes, infographics, or just organizing your resources, this task can be handled beautifully with a bit of Python magic!
In this blog post, I’ll show you how to extract all images from a PDF and merge them vertically into one single image. Let’s dive in!
Why Would You Need This?
-
Quick review: Scroll through all images from a PDF at once.
-
Presentation: Share notes or diagrams as a single, long image.
-
Social media: Post study materials or infographics without splitting them into multiple files.
Tools You’ll Need
To complete this project, you’ll need:
-
Python (3.x)
-
PyMuPDF
(for extracting images from PDFs) -
Pillow
(for handling and merging images)
You can install these with:
pip install pymupdf pillow
Step 1: Extract Images from PDF
First, we use PyMuPDF to extract all the images from each page of your PDF.
import fitz # PyMuPDF
import io
from PIL import Image
pdf_path = "yourfile.pdf"
pdf_file = fitz.open(pdf_path)
images = []
for page_index in range(len(pdf_file)):
page = pdf_file[page_index]
image_list = page.get_images(full=True)
for img_index, img in enumerate(image_list):
xref = img[0]
base_image = pdf_file.extract_image(xref)
image_bytes = base_image["image"]
image = Image.open(io.BytesIO(image_bytes))
images.append(image)
Step 2: Merge Images Vertically
Now, let’s stack the images vertically—each image below the previous one.
# Calculate the total width and height
width = max(img.width for img in images)
total_height = sum(img.height for img in images)
# Create a new blank image with the combined height
merged_image = Image.new("RGB", (width, total_height), (255, 255, 255))
current_y = 0
for img in images:
merged_image.paste(img, (0, current_y))
current_y += img.height
# Save the final merged image
merged_image.save("merged_images.jpg")
Step 3: Run Your Script
Just run your script! After execution, you’ll find merged_images.jpg
in your project folder containing all images from your PDF, stacked one below the other.
A Few Tips & Tricks
-
Handling different widths: If your images have different widths, the script uses the widest image. You may want to resize all images to the same width for uniformity.
-
Large PDFs: For very large PDFs, memory usage can increase. You can process in batches if needed.
-
Other formats: You can save as PNG or other formats by changing the file extension in
save()
.Full Script Example
Here’s the complete code:
import fitz # PyMuPDF
import io
from PIL import Image
pdf_path = "yourfile.pdf"
pdf_file = fitz.open(pdf_path)
images = []
for page_index in range(len(pdf_file)):
page = pdf_file[page_index]
image_list = page.get_images(full=True)
for img in image_list:
xref = img[0]
base_image = pdf_file.extract_image(xref)
image_bytes = base_image["image"]
image = Image.open(io.BytesIO(image_bytes))
images.append(image)
if images:
width = max(img.width for img in images)
total_height = sum(img.height for img in images)
merged_image = Image.new("RGB", (width, total_height), (255, 255, 255))
current_y = 0
for img in images:
merged_image.paste(img, (0, current_y))
current_y += img.height
merged_image.save("merged_images.jpg")
print("All images merged successfully!")
else:
print("No images found in the PDF.")
Conclusion
With just a few lines of Python code, you can extract and merge all images from a PDF. This method is especially useful for students, educators, designers, and anyone who works with PDFs regularly.
Try it out, and let me know in the comments how it worked for you! If you have any questions, feel free to ask. Happy coding!
If you found this helpful, don’t forget to subscribe for more Python tricks and automation guides!
Post a Comment
You can help us by Clicking on ads. ^_^
Please do not send spam comment : )