Using OpenNLP, you can also detect the Parts of Speech of a given sentence and print them. Instead of full name of the parts of speech, OpenNLP uses short forms of each parts of speech. The following table indicates the various parts of speeches detected by OpenNLP and their meanings.
Parts of Speech | Meaning of parts of speech |
---|---|
NN | Noun, singular or mass |
DT | Determiner |
VB | Verb, base form |
VBD | Verb, past tense |
VBZ | Verb, third person singular present |
IN | Preposition or subordinating conjunction |
NNP | Proper noun, singular |
TO | to |
JJ | Adjective |
To tag the parts of speech of a sentence, OpenNLP uses a model, a file named en-posmaxent.bin. This is a predefined model which is trained to tag the parts of speech of the given raw text.
The POSTaggerME class of the opennlp.tools.postag package is used to load this model, and tag the parts of speech of the given raw text using OpenNLP library. To do so, you need to −
Load the en-pos-maxent.bin model using the POSModel class.
Instantiate the POSTaggerME class.
Tokenize the sentence.
Generate the tags using tag() method.
Print the tokens and tags using POSSample class.
Following are the steps to be followed to write a program which tags the parts of the speech in the given raw text using the POSTaggerME class.
The model for POS tagging is represented by the class named POSModel, which belongs to the package opennlp.tools.postag.
To load a tokenizer model −
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of the model in String format to its constructor).
Instantiate the POSModel class and pass the InputStream (object) of the model as a parameter to its constructor, as shown in the following code block −
//Loading Parts of speech-maxent model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/en-pos-maxent.bin"); POSModel model = new POSModel(inputStream);
The POSTaggerME class of the package opennlp.tools.postag is used to predict the parts of speech of the given raw text. It uses Maximum Entropy to make its decisions.
Instantiate this class and pass the model object created in the previous step, as shown below −
//Instantiating POSTaggerME class POSTaggerME tagger = new POSTaggerME(model);
The tokenize() method of the whitespaceTokenizer class is used to tokenize the raw text passed to it. This method accepts a String variable as a parameter, and returns an array of Strings (tokens).
Instantiate the whitespaceTokenizer class and the invoke this method by passing the String format of the sentence to this method.
//Tokenizing the sentence using WhitespaceTokenizer class WhitespaceTokenizer whitespaceTokenizer= WhitespaceTokenizer.INSTANCE; String[] tokens = whitespaceTokenizer.tokenize(sentence);
The tag() method of the whitespaceTokenizer class assigns POS tags to the sentence of tokens. This method accepts an array of tokens (String) as a parameter and returns tag (array).
Invoke the tag() method by passing the tokens generated in the previous step to it.
//Generating tags String[] tags = tagger.tag(tokens);
The POSSample class represents the POS-tagged sentence. To instantiate this class, we would require an array of tokens (of the text) and an array of tags.
The toString() method of this class returns the tagged sentence. Instantiate this class by passing the token and the tag arrays created in the previous steps and invoke its toString() method, as shown in the following code block.
//Instantiating the POSSample class POSSample sample = new POSSample(tokens, tags); System.out.println(sample.toString());
Example
Following is the program which tags the parts of speech in a given raw text. Save this program in a file with the name PosTaggerExample.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.postag.POSModel; import opennlp.tools.postag.POSSample; import opennlp.tools.postag.POSTaggerME; import opennlp.tools.tokenize.WhitespaceTokenizer; public class PosTaggerExample { public static void main(String args[]) throws Exception{ //Loading Parts of speech-maxent model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/en-pos-maxent.bin"); POSModel model = new POSModel(inputStream); //Instantiating POSTaggerME class POSTaggerME tagger = new POSTaggerME(model); String sentence = "Hi welcome to Howcodex"; //Tokenizing the sentence using WhitespaceTokenizer class WhitespaceTokenizer whitespaceTokenizer= WhitespaceTokenizer.INSTANCE; String[] tokens = whitespaceTokenizer.tokenize(sentence); //Generating tags String[] tags = tagger.tag(tokens); //Instantiating the POSSample class POSSample sample = new POSSample(tokens, tags); System.out.println(sample.toString()); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac PosTaggerExample.java java PosTaggerExample
On executing, the above program reads the given text and detects the parts of speech of these sentences and displays them, as shown below.
Hi_NNP welcome_JJ to_TO Howcodex_VB
Following is the program which tags the parts of speech of a given raw text. It also monitors the performance and displays the performance of the tagger. Save this program in a file with the name PosTagger_Performance.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.cmdline.PerformanceMonitor; import opennlp.tools.postag.POSModel; import opennlp.tools.postag.POSSample; import opennlp.tools.postag.POSTaggerME; import opennlp.tools.tokenize.WhitespaceTokenizer; public class PosTagger_Performance { public static void main(String args[]) throws Exception{ //Loading Parts of speech-maxent model InputStream inputStream = new FileInputStream("C:/OpenNLP_models/en-pos-maxent.bin"); POSModel model = new POSModel(inputStream); //Creating an object of WhitespaceTokenizer class WhitespaceTokenizer whitespaceTokenizer= WhitespaceTokenizer.INSTANCE; //Tokenizing the sentence String sentence = "Hi welcome to Howcodex"; String[] tokens = whitespaceTokenizer.tokenize(sentence); //Instantiating POSTaggerME class POSTaggerME tagger = new POSTaggerME(model); //Generating tags String[] tags = tagger.tag(tokens); //Instantiating POSSample class POSSample sample = new POSSample(tokens, tags); System.out.println(sample.toString()); //Monitoring the performance of POS tagger PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent"); perfMon.start(); perfMon.incrementCounter(); perfMon.stopAndPrintFinalResult(); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac PosTaggerExample.java java PosTaggerExample
On executing, the above program reads the given text and tags the parts of speech of these sentences and displays them. In addition, it also monitors the performance of the POS tagger and displays it.
Hi_NNP welcome_JJ to_TO Howcodex_VB Average: 0.0 sent/s Total: 1 sent Runtime: 0.0s
The probs() method of the POSTaggerME class is used to find the probabilities for each tag of the recently tagged sentence.
//Getting the probabilities of the recent calls to tokenizePos() method double[] probs = detector.getSentenceProbabilities();
Following is the program which displays the probabilities for each tag of the last tagged sentence. Save this program in a file with the name PosTaggerProbs.java.
import java.io.FileInputStream; import java.io.InputStream; import opennlp.tools.postag.POSModel; import opennlp.tools.postag.POSSample; import opennlp.tools.postag.POSTaggerME; import opennlp.tools.tokenize.WhitespaceTokenizer; public class PosTaggerProbs { public static void main(String args[]) throws Exception{ //Loading Parts of speech-maxent model InputStream inputStream = new FileInputStream("C:/OpenNLP_mdl/en-pos-maxent.bin"); POSModel model = new POSModel(inputStream); //Creating an object of WhitespaceTokenizer class WhitespaceTokenizer whitespaceTokenizer= WhitespaceTokenizer.INSTANCE; //Tokenizing the sentence String sentence = "Hi welcome to Howcodex"; String[] tokens = whitespaceTokenizer.tokenize(sentence); //Instantiating POSTaggerME class POSTaggerME tagger = new POSTaggerME(model); //Generating tags String[] tags = tagger.tag(tokens); //Instantiating the POSSample class POSSample sample = new POSSample(tokens, tags); System.out.println(sample.toString()); //Probabilities for each tag of the last tagged sentence. double [] probs = tagger.probs(); System.out.println(" "); //Printing the probabilities for(int i = 0; i<probs.length; i++) System.out.println(probs[i]); } }
Compile and execute the saved Java file from the Command prompt using the following commands −
javac TokenizerMEProbs.java java TokenizerMEProbs
On executing, the above program reads the given raw text, tags the parts of speech of each token in it, and displays them. In addition, it also displays the probabilities for each parts of speech in the given sentence, as shown below.
Hi_NNP welcome_JJ to_TO Howcodex_VB 0.6416834779738033 0.42983612874819177 0.8584513635863117 0.4394784478206072