PDFBox - Document Properties


Advertisements

Like other files, a PDF document also has document properties. These properties are key-value pairs. Each property gives particular information about the document.

Following are the properties of a PDF document −

S.No. Property & Description
1

File

This property holds the name of the file.

2

Title

Using this property, you can set the title for the document.

3

Author

Using this property, you can set the name of the author for the document.

4

Subject

Using this property, you can specify the subject of the PDF document.

5

Keywords

Using this property, you can list the keywords with which we can search the document.

6

Created

Using this property, you can set the date created for the document.

7

Modified

Using this property, you can set the date modified for the document.

8

Application

Using this property, you can set the Application of the document.

Following is a screenshot of the document properties table of a PDF document.

PDF properties

Setting the Document Properties

PDFBox provides you a class named PDDocumentInformation. This class has a set of setter and getter methods.

The setter methods of this class are used to set values to various properties of a document and getter methods which are used to retrieve these values.

Following are the setter methods of the PDDocumentInformation class.

S.No. Method & Description
1

setAuthor(String author)

This method is used to set the value for the property of the PDF document named Author.

2

setTitle(String title)

This method is used to set the value for the property of the PDF document named Title.

3

setCreator(String creator)

This method is used to set the value for the property of the PDF document named Creator.

4

setSubject(String subject)

This method is used to set the value for the property of the PDF document named Subject.

5

setCreationDate(Calendar date)

This method is used to set the value for the property of the PDF document named CreationDate.

6

setModificationDate(Calendar date)

This method is used to set the value for the property of the PDF document named ModificationDate.

7

setKeywords(String keywords list)

This method is used to set the value for the property of the PDF document named Keywords.

Example

PDFBox provides a class called PDDocumentInformation and this class provides various methods. These methods can set various properties to the document and retrieve them.

This example demonstrates how to add properties such as Author, Title, Date, and Subject to a PDF document. Here, we will create a PDF document named doc_attributes.pdf, add various attributes to it, and save it in the path C:/PdfBox_Examples/. Save this code in a file with name AddingAttributes.java.

import java.io.IOException; 
import java.util.Calendar; 
import java.util.GregorianCalendar;
  
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.PDPage;

public class AddingDocumentAttributes {
   public static void main(String args[]) throws IOException {

      //Creating PDF document object
      PDDocument document = new PDDocument();

      //Creating a blank page
      PDPage blankPage = new PDPage();
       
      //Adding the blank page to the document
      document.addPage( blankPage );

      //Creating the PDDocumentInformation object 
      PDDocumentInformation pdd = document.getDocumentInformation();

      //Setting the author of the document
      pdd.setAuthor("Howcodex");
       
      // Setting the title of the document
      pdd.setTitle("Sample document"); 
       
      //Setting the creator of the document 
      pdd.setCreator("PDF Examples"); 
       
      //Setting the subject of the document 
      pdd.setSubject("Example document"); 
       
      //Setting the created date of the document 
      Calendar date = new GregorianCalendar();
      date.set(2015, 11, 5); 
      pdd.setCreationDate(date);
      //Setting the modified date of the document 
      date.set(2016, 6, 5); 
      pdd.setModificationDate(date); 
       
      //Setting keywords for the document 
      pdd.setKeywords("sample, first example, my pdf"); 
 
      //Saving the document 
      document.save("C:/PdfBox_Examples/doc_attributes.pdf");

      System.out.println("Properties added successfully ");
       
      //Closing the document
      document.close();

   }
}

Compile and execute the saved Java file from the command prompt using the following commands.

javac AddingAttributes.java 
java AddingAttributes 

Upon execution, the above program adds all the specified attributes to the document displaying the following message.

Properties added successfully

Now, if you visit the given path you can find the PDF created in it. Right click on the document and select the document properties option as shown below.

Document properties

This will give you the document properties window and here you can observe all the properties of the document were set to specified values.

Properties menu

Retrieving the Document Properties

You can retrieve the properties of a document using the getter methods provided by the PDDocumentInformation class.

Following are the getter methods of the PDDocumentInformation class.

S.No. Method & Description
1

getAuthor()

This method is used to retrieve the value for the property of the PDF document named Author.

2

getTitle()

This method is used to retrieve the value for the property of the PDF document named Title.

3

getCreator()

This method is used to retrieve the value for the property of the PDF document named Creator.

4

getSubject()

This method is used to retrieve the value for the property of the PDF document named Subject.

5

getCreationDate()

This method is used to retrieve the value for the property of the PDF document named CreationDate.

6

getModificationDate()

This method is used to retrieve the value for the property of the PDF document named ModificationDate.

7

getKeywords()

This method is used to retrieve the value for the property of the PDF document named Keywords.

Example

This example demonstrates how to retrieve the properties of an existing PDF document. Here, we will create a Java program and load the PDF document named doc_attributes.pdf, which is saved in the path C:/PdfBox_Examples/, and retrieve its properties. Save this code in a file with name RetrivingDocumentAttributes.java.

import java.io.File; 
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument; 
import org.apache.pdfbox.pdmodel.PDDocumentInformation;

public class RetrivingDocumentAttributes {
   public static void main(String args[]) throws IOException {
      
      //Loading an existing document 
      File file = new File("C:/PdfBox_Examples/doc_attributes.pdf")
      PDDocument document = PDDocument.load(file);
      //Getting the PDDocumentInformation object
      PDDocumentInformation pdd = document.getDocumentInformation();

      //Retrieving the info of a PDF document
      System.out.println("Author of the document is :"+ pdd.getAuthor());
      System.out.println("Title of the document is :"+ pdd.getTitle());
      System.out.println("Subject of the document is :"+ pdd.getSubject());

      System.out.println("Creator of the document is :"+ pdd.getCreator());
      System.out.println("Creation date of the document is :"+ pdd.getCreationDate());
      System.out.println("Modification date of the document is :"+ 
         pdd.getModificationDate()); 
      System.out.println("Keywords of the document are :"+ pdd.getKeywords()); 
       
      //Closing the document 
      document.close();        
   }  
}      

Compile and execute the saved Java file from the command prompt using the following commands.

javac RetrivingDocumentAttributes.java 
java RetrivingDocumentAttributes

Upon execution, the above program retrieves all the attributes of the document and displays them as shown below.

Author of the document is :Howcodex 
Title of the document is :Sample document 
Subject of the document is :Example document 
Creator of the document is :PDF Examples 
Creation date of the document is :11/5/2015
Modification date of the document is :6/5/2016
Keywords of the document are :sample, first example, my pdf
Advertisements