In the previous chapter, we have seen how to merge multiple PDF documents. In this chapter, we will understand how to extract an image from a page of a PDF document.
PDFBox library provides you a class named PDFRenderer which renders a PDF document into an AWT BufferedImage.
Following are the steps to generate an image from a PDF document.
Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.
File file = new File("path of the document") PDDocument document = PDDocument.load(file);
The class named PDFRenderer renders a PDF document into an AWT BufferedImage. Therefore, you need to instantiate this class as shown below. The constructor of this class accepts a document object; pass the document object created in the previous step as shown below.
PDFRenderer renderer = new PDFRenderer(document);
You can render the image in a particular page using the method renderImage() of the Renderer class, to this method you need to pass the index of the page where you have the image that is to be rendered.
BufferedImage image = renderer.renderImage(0);
You can write the image rendered in the previous step to a file using the write() method. To this method, you need to pass three parameters −
ImageIO.write(image, "JPEG", new File("C:/PdfBox_Examples/myimage.jpg"));
Finally, close the document using the close() method of the PDDocument class as shown below.
document.close();
Suppose, we have a PDF document — sample.pdf in the path C:\PdfBox_Examples\ and this contains an image in its first page as shown below.
This example demonstrates how to convert the above PDF document into an image file. Here, we will retrieve the image in the 1st page of the PDF document and save it as myimage.jpg. Save this code as PdfToImage.java
import java.awt.image.BufferedImage; import java.io.File; import javax.imageio.ImageIO; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.rendering.PDFRenderer; public class PdfToImage { public static void main(String args[]) throws Exception { //Loading an existing PDF document File file = new File("C:/PdfBox_Examples/sample.pdf"); PDDocument document = PDDocument.load(file); //Instantiating the PDFRenderer class PDFRenderer renderer = new PDFRenderer(document); //Rendering an image from the PDF document BufferedImage image = renderer.renderImage(0); //Writing the image to a file ImageIO.write(image, "JPEG", new File("C:/PdfBox_Examples/myimage.jpg")); System.out.println("Image created"); //Closing the document document.close(); } }
Compile and execute the saved Java file from the command prompt using the following commands.
javac PdfToImage.java java PdfToImage
Upon execution, the above program retrieves the image in the given PDF document displaying the following message.
Image created
If you verify the given path, you can observe that the image is generated and saved as myimage.jpg as shown below.