InterviewSolution
| 1. |
What do you mean by Stemming in NLP? |
|
Answer» When we remove the suffixes from a word so that the word is reduced to its base form, this process is called stemming. When the word is reduced to its base form, all the different VARIANTS of that word can be represented by the same form (e.g., “bird” and “birds” are both reduced to “bird”). We can do this by using a fixed set of rules. For instance: if a word ends in “-es,” we can remove the “-es”). Even though these rules might not really make sense as a LINGUISTICALLY correct base form, stemming is usually carried out to match user queries in search engines to relevant documents. And in text classification, is done to reduce the feature space to train our machine learning (ML) models. The code snippet given below depicts the way to use a well known NLP algorithm for stemming called Porter Stemmer using NLTK: from nltk.stem.porter IMPORT PorterStemmerstemmer = PorterStemmer()word1, word2 = "bikes", "revolution" print(stemmer.stem(word1), stemmer.stem(word2))This gives “bike” as the stemmed version for “bikes,” but “revolut” as the stemmed form of “revolution,” even though the latter is not linguistically correct. Even if this might not affect the performance of the search engine, a derivation of the correct linguistic form becomes useful in some other cases. This can be done by another process that is CLOSER to stemming, known as lemmatization. |
|