OpenNLP - Parsing the Sentences


Advertisements

Using OpenNLP API, you can parse the given sentences. In this chapter, we will discuss how to parse raw text using OpenNLP API.

Parsing Raw Text using OpenNLP Library

To detect the sentences, OpenNLP uses a predefined model, a file named en-parserchunking.bin. This is a predefined model which is trained to parse the given raw text.

The Parser class of the opennlp.tools.Parser package is used to hold the parse constituents and the ParserTool class of the opennlp.tools.cmdline.parser package is used to parse the content.

Following are the steps to be followed to write a program which parses the given raw text using the ParserTool class.

Step 1: Loading the model

The model for parsing text is represented by the class named ParserModel, which belongs to the package opennlp.tools.parser.

To load a tokenizer model −

  • Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of the model in String format to its constructor).

  • Instantiate the ParserModel class and pass the InputStream (object) of the model as a parameter to its constructor, as shown in the following code block.

//Loading parser model 
InputStream inputStream = new FileInputStream(".../en-parserchunking.bin"); 
ParserModel model = new ParserModel(inputStream);

Step 2: Creating an object of the Parser class

The Parser class of the package opennlp.tools.parser represents a data structure for holding parse constituents. You can create an object of this class using the static create() method of the ParserFactory class.

Invoke the create() method of the ParserFactory by passing the model object created in the previous step, as shown below −

//Creating a parser Parser parser = ParserFactory.create(model); 

Step 3: Parsing the sentence

The parseLine() method of the ParserTool class is used to parse the raw text in OpenNLP. This method accepts −

  • a String variable representing the text to be parsed.

  • a parser object.

  • an integer representing the number of parses to be carried out.

Invoke this method by passing the sentence the following parameters: the parse object created in the previous steps, and an integer representing the required number of parses to be carried out.

//Parsing the sentence 
String sentence = "Howcodex is the largest tutorial library.";       
Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);

Example

Following is the program which parses the given raw text. Save this program in a file with the name ParserExample.java.

import java.io.FileInputStream; 
import java.io.InputStream;  

import opennlp.tools.cmdline.parser.ParserTool; 
import opennlp.tools.parser.Parse; 
import opennlp.tools.parser.Parser; 
import opennlp.tools.parser.ParserFactory; 
import opennlp.tools.parser.ParserModel;  

public class ParserExample { 
   
   public static void main(String args[]) throws Exception{  
      //Loading parser model 
      InputStream inputStream = new FileInputStream(".../en-parserchunking.bin"); 
      ParserModel model = new ParserModel(inputStream); 
       
      //Creating a parser 
      Parser parser = ParserFactory.create(model); 
      
      //Parsing the sentence 
      String sentence = "Howcodex is the largest tutorial library.";
      Parse topParses[] = ParserTool.parseLine(sentence, parser, 1); 
    
      for (Parse p : topParses) 
         p.show();          
   } 
}      

Compile and execute the saved Java file from the Command prompt using the following commands −

javac ParserExample.java 
java ParserExample 

On executing, the above program reads the given raw text, parses it, and displays the following output −

(TOP (S (NP (NN Howcodex)) (VP (VBZ is) (NP (DT the) (JJS largest) (NN
   tutorial) (NN library.))))) 
Advertisements