While working with NLP, especially in the case of frequency analysis and text indexing, it is always beneficial to compress the vocabulary without losing meaning because it saves lots of memory. To achieve this, we must have to define mapping of a word to its synonyms. In the example below, we will be creating a class named word_syn_replacer which can be used for replacing the words with their common synonyms.
First, import the necessary package re to work with regular expressions.
import re from nltk.corpus import wordnet
Next, create the class that takes a word replacement mapping −
class word_syn_replacer(object): def __init__(self, word_map): self.word_map = word_map def replace(self, word): return self.word_map.get(word, word)
Save this python program (say replacesyn.py) and run it from python command prompt. After running it, import word_syn_replacer class when you want to replace words with common synonyms. Let us see how.
from replacesyn import word_syn_replacer rep_syn = word_syn_replacer ({‘bday’: ‘birthday’) rep_syn.replace(‘bday’)
'birthday'
import re from nltk.corpus import wordnet class word_syn_replacer(object): def __init__(self, word_map): self.word_map = word_map def replace(self, word): return self.word_map.get(word, word)
Now once you saved the above program and run it, you can import the class and use it as follows −
from replacesyn import word_syn_replacer rep_syn = word_syn_replacer ({‘bday’: ‘birthday’) rep_syn.replace(‘bday’)
'birthday'
The disadvantage of the above method is that we should have to hardcode the synonyms in a Python dictionary. We have two better alternatives in the form of CSV and YAML file. We can save our synonym vocabulary in any of the above-mentioned files and can construct word_map dictionary from them. Let us understand the concept with the help of examples.
In order to use CSV file for this purpose, the file should have two columns, first column consist of word and the second column consists of the synonyms meant to replace it. Let us save this file as syn.csv. In the example below, we will be creating a class named CSVword_syn_replacer which will extends word_syn_replacer in replacesyn.py file and will be used to construct the word_map dictionary from syn.csv file.
First, import the necessary packages.
import csv
Next, create the class that takes a word replacement mapping −
class CSVword_syn_replacer(word_syn_replacer): def __init__(self, fname): word_map = {} for line in csv.reader(open(fname)): word, syn = line word_map[word] = syn super(Csvword_syn_replacer, self).__init__(word_map)
After running it, import CSVword_syn_replacer class when you want to replace words with common synonyms. Let us see how?
from replacesyn import CSVword_syn_replacer rep_syn = CSVword_syn_replacer (‘syn.csv’) rep_syn.replace(‘bday’)
'birthday'
import csv class CSVword_syn_replacer(word_syn_replacer): def __init__(self, fname): word_map = {} for line in csv.reader(open(fname)): word, syn = line word_map[word] = syn super(Csvword_syn_replacer, self).__init__(word_map)
Now once you saved the above program and run it, you can import the class and use it as follows −
from replacesyn import CSVword_syn_replacer rep_syn = CSVword_syn_replacer (‘syn.csv’) rep_syn.replace(‘bday’)
'birthday'
As we have used CSV file, we can also use YAML file to for this purpose (we must have PyYAML installed). Let us save the file as syn.yaml. In the example below, we will be creating a class named YAMLword_syn_replacer which will extends word_syn_replacer in replacesyn.py file and will be used to construct the word_map dictionary from syn.yaml file.
First, import the necessary packages.
import yaml
Next, create the class that takes a word replacement mapping −
class YAMLword_syn_replacer(word_syn_replacer): def __init__(self, fname): word_map = yaml.load(open(fname)) super(YamlWordReplacer, self).__init__(word_map)
After running it, import YAMLword_syn_replacer class when you want to replace words with common synonyms. Let us see how?
from replacesyn import YAMLword_syn_replacer rep_syn = YAMLword_syn_replacer (‘syn.yaml’) rep_syn.replace(‘bday’)
'birthday'
import yaml class YAMLword_syn_replacer(word_syn_replacer): def __init__(self, fname): word_map = yaml.load(open(fname)) super(YamlWordReplacer, self).__init__(word_map)
Now once you saved the above program and run it, you can import the class and use it as follows −
from replacesyn import YAMLword_syn_replacer rep_syn = YAMLword_syn_replacer (‘syn.yaml’) rep_syn.replace(‘bday’)
'birthday'
As we know that an antonym is a word having opposite meaning of another word, and the opposite of synonym replacement is called antonym replacement. In this section, we will be dealing with antonym replacement, i.e., replacing words with unambiguous antonyms by using WordNet. In the example below, we will be creating a class named word_antonym_replacer which have two methods, one for replacing the word and other for removing the negations.
First, import the necessary packages.
from nltk.corpus import wordnet
Next, create the class named word_antonym_replacer −
class word_antonym_replacer(object): def replace(self, word, pos=None): antonyms = set() for syn in wordnet.synsets(word, pos=pos): for lemma in syn.lemmas(): for antonym in lemma.antonyms(): antonyms.add(antonym.name()) if len(antonyms) == 1: return antonyms.pop() else: return None def replace_negations(self, sent): i, l = 0, len(sent) words = [] while i < l: word = sent[i] if word == 'not' and i+1 < l: ant = self.replace(sent[i+1]) if ant: words.append(ant) i += 2 continue words.append(word) i += 1 return words
Save this python program (say replaceantonym.py) and run it from python command prompt. After running it, import word_antonym_replacer class when you want to replace words with their unambiguous antonyms. Let us see how.
from replacerantonym import word_antonym_replacer rep_antonym = word_antonym_replacer () rep_antonym.replace(‘uglify’)
['beautify''] sentence = ["Let us", 'not', 'uglify', 'our', 'country'] rep_antonym.replace _negations(sentence)
["Let us", 'beautify', 'our', 'country']
nltk.corpus import wordnet class word_antonym_replacer(object): def replace(self, word, pos=None): antonyms = set() for syn in wordnet.synsets(word, pos=pos): for lemma in syn.lemmas(): for antonym in lemma.antonyms(): antonyms.add(antonym.name()) if len(antonyms) == 1: return antonyms.pop() else: return None def replace_negations(self, sent): i, l = 0, len(sent) words = [] while i < l: word = sent[i] if word == 'not' and i+1 < l: ant = self.replace(sent[i+1]) if ant: words.append(ant) i += 2 continue words.append(word) i += 1 return words
Now once you saved the above program and run it, you can import the class and use it as follows −
from replacerantonym import word_antonym_replacer rep_antonym = word_antonym_replacer () rep_antonym.replace(‘uglify’) sentence = ["Let us", 'not', 'uglify', 'our', 'country'] rep_antonym.replace _negations(sentence)
["Let us", 'beautify', 'our', 'country']