Using OpenNLP API, you can parse the given sentences. In this chapter, we will discuss how to parse raw text using OpenNLP API.
To detect the sentences, OpenNLP uses a predefined model, a file named en-parserchunking.bin. This is a predefined model which is trained to parse the given raw text.
The Parser class of the opennlp.tools.Parser package is used to hold the parse constituents and the ParserTool class of the opennlp.tools.cmdline.parser package is used to parse the content.
Following are the steps to be followed to write a program which parses the given raw text using the ParserTool class.
The model for parsing text is represented by the class named ParserModel, which belongs to the package opennlp.tools.parser.
To load a tokenizer model −
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of the model in String format to its constructor).
Instantiate the ParserModel class and pass the InputStream (object) of the model as a parameter to its constructor, as shown in the following code block.
//Loading parser model InputStream inputStream = new FileInputStream(".../en-parserchunking.bin"); ParserModel model = new ParserModel(inputStream);
The Parser class of the package opennlp.tools.parser represents a data structure for holding parse constituents. You can create an object of this class using the static create() method of the ParserFactory class.
Invoke the create() method of the ParserFactory by passing the model object created in the previous step, as shown below −
//Creating a parser Parser parser = ParserFactory.create(model);
The parseLine() method of the ParserTool class is used to parse the raw text in OpenNLP. This method accepts −
a String variable representing the text to be parsed.
a parser object.
an integer representing the number of parses to be carried out.
Invoke this method by passing the sentence the following parameters: the parse object created in the previous steps, and an integer representing the required number of parses to be carried out.
//Parsing the sentence String sentence = "Howcodex is the largest tutorial library."; Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);
Example
Following is the program which parses the given raw text. Save this program in a file with the name ParserExample.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.cmdline.parser.ParserTool; import opennlp.tools.parser.Parse; import opennlp.tools.parser.Parser; import opennlp.tools.parser.ParserFactory; import opennlp.tools.parser.ParserModel; public class ParserExample { public static void main(String args[]) throws Exception{ //Loading parser model InputStream inputStream = new FileInputStream(".../en-parserchunking.bin"); ParserModel model = new ParserModel(inputStream); //Creating a parser Parser parser = ParserFactory.create(model); //Parsing the sentence String sentence = "Howcodex is the largest tutorial library."; Parse topParses[] = ParserTool.parseLine(sentence, parser, 1); for (Parse p : topParses) p.show(); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac ParserExample.java java ParserExample
On executing, the above program reads the given raw text, parses it, and displays the following output −
(TOP (S (NP (NN Howcodex)) (VP (VBZ is) (NP (DT the) (JJS largest) (NN tutorial) (NN library.)))))