Spring Batch - Readers, Writers & Processors


Advertisements

An Item Reader reads data into the spring batch application from a particular source, whereas an Item Writer writes data from Spring Batch application to a particular destination.

An Item processor is a class which contains the processing code which processes the data read in to the spring batch. If the application reads n records the code in the processor will be executed on each record.

A chunk is a child element of the tasklet. It is used to perform read, write, and processing operations. We can configure reader, writer, and processors using this element, within a step as shown below.

<batch:job id = "helloWorldJob"> 
   <batch:step id = "step1"> 
      <batch:tasklet> 
         <batch:chunk reader = "cvsFileItemReader" writer = "xmlItemWriter" 
            processor = "itemProcessor" commit-interval = "10"> 
         </batch:chunk> 
      </batch:tasklet> 
   </batch:step> 
</batch:job>

Spring Batch provides readers and writers to read and write data form various file systems/databases such as MongoDB, Neo4j, MySQL, XML, flatfile, CSV, etc.

To include a reader in your application, you need to define a bean for that reader, provide values to all the required properties within the bean, and pass the id of such bean as a value to the attribute of the chunk element reader (same for writer).

ItemReader

It is the entity of a step (of a batch process) which reads data. An ItemReader reads one item a time. Spring Batch provides an Interface ItemReader. All the readers implement this interface.

Following are some of the predefined ItemReader classes provided by Spring Batch to read from various sources.

Reader Purpose
FlatFIleItemReader To read data from flat files.
StaxEventItemReader To read data from XML files.
StoredProcedureItemReader To read data from the stored procedures of a database.
JDBCPagingItemReader To read data from relational databases database.
MongoItemReader To read data from MongoDB.
Neo4jItemReader To read data from Neo4jItemReader.

We need to configure the ItemReaders by creating the beans. Following is an example of StaxEventItemReader which reads data from an XML file.

<bean id = "mysqlItemWriter" 
   class = "org.springframework.batch.item.xml.StaxEventItemWriter"> 
   <property name = "resource" value = "file:xml/outputs/userss.xml" /> 
   <property name = "marshaller" ref = "reportMarshaller" /> 
   <property name = "rootTagName" value = "Tutorial" /> 
</bean> 

<bean id = "reportMarshaller" 
   class = "org.springframework.oxm.jaxb.Jaxb2Marshaller"> 
   <property name = "classesToBeBound"> 
      <list> 
         <value>Tutorial</value> 
      </list> 
   </property> 
</bean> 

As observed, while configuring, we need to specify the respective class name of the required reader and we need to provide values to all the required properties.

ItemWriter

It is the element of the step of a batch process which writes data. An ItemWriter writes one item a time. Spring Batch provides an Interface ItemWriter. All the writers implement this interface.

Following are some of the predefined ItemWriter classes provided by Spring Batch to read from various sources.

Writer Purpose
FlatFIleItemWriter To write data into flat files.
StaxEventItemWriter To write data into XML files.
StoredProcedureItemWriter To write data into the stored procedures of a database.
JDBCPagingItemWriter To write data into relational databases database.
MongoItemWriter To write data into MongoDB.
Neo4jItemWriter To write data into Neo4j.

In same way, we need to configure the ItemWriters by creating the beans. Following is an example of JdbcCursorItemReader which writes data to an MySQL database.

<bean id = "dbItemReader"
   class = "org.springframework.batch.item.database.JdbcCursorItemReader" scope = "step">
   <property name = "dataSource" ref = "dataSource" />
   <property name = "sql" value = "select * from tutorialsdata" />
   <property name = "rowMapper">
      <bean class = "TutorialRowMapper" /> 
   </property>
</bean>

Item Processor

ItemProcessor: An ItemProcessor is used to process the data. When the given item is not valid it returns null, else it processes the given item and returns the processed result. The interface ItemProcessor<I,O> represents the processor.

Tasklet class − When no reader and writer are given, a Tasklet acts as a processor for SpringBatch. It processes only single task.

We can define a custom item processor by implementing the interface ItemProcessor of the package org.springframework.batch.item.ItemProcessor. This ItemProcessor class accepts an object and processes the data and returns the processed data as another object.

In a batch process, if "n" records or data elements are read, then for each record, it will read the data, process it, and writes the data in the writer. To process the data, it relays on the processor passed.

For example, let’s suppose you have written code to load a particular PDF document, create a new page, write the data item on to the PDF in a tabular format. If you execute this application, it reads all the data items from the XML document, stores them in the MySQL database, and prints them in the given PDF document in individual pages.

Example

Following is a sample ItemProcessor class.

import org.springframework.batch.item.ItemProcessor;  

public class CustomItemProcessor implements ItemProcessor<Tutorial, Tutorial> {  
   
   @Override 
   public Tutorial process(Tutorial item) throws Exception {  
      System.out.println("Processing..." + item); 
      return item; 
   } 
} 
Advertisements