InterviewSolution
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
What do you mean by perplexity in NLP? |
|
Answer» It's a statistic for evaluating the effectiveness of language models. It is described MATHEMATICALLY as a function of the likelihood that the language model describes a test sample. The perplexity of a test sample X = x1, x2, x3,....,xn is given by, PP(X)=P(x1,x2,…,xN)-1N The total number of word tokens is N. The more perplexing the situation, the less information the language model conveys. ConclusionOne of the most IMPORTANT reasons for NLP is that it allows computers to converse with people in natural language. Other language-related activities are also scaled. Computers can now hear, analyse, QUANTIFY, and identify which parts of speech are significant thanks to Natural Language Processing (NLP). NLP has a wide range of applications, including chatbots, sentiment analysis, and market intelligence. Since its introduction, NLP has grown in POPULARITY. Today, devices like Amazon's Alexa are extensively used all over the world. And, for businesses, business intelligence and consumer monitoring are quickly gaining traction and will soon rule the industry. References and Resources:
|
|
| 2. |
What is the meaning of N-gram in NLP? |
|
Answer» Text N-grams are COMMONLY used in text mining and natural language processing. They're essentially a COLLECTION of co-occurring WORDS within a specific frame, and when computing the n-grams, you usually advance one word (although you can MOVE X words FORWARD in more advanced scenarios). |
|
| 3. |
What is the meaning of Pragmatic Analysis in NLP? |
|
Answer» PRAGMATIC ANALYSIS is concerned with outside word knowledge, which refers to information that is not contained in the documents and/or questions. The many parts of the language that require real-world knowledge are derived from a PRAGMATICS analysis that FOCUSES on what was DESCRIBED and reinterpreted by what it truly meant. |
|
| 4. |
What do you mean by Masked language modelling? |
|
Answer» Masked language MODELLING is an NLP technique for extracting the output from a contaminated input. Learners can use this approach to master deep representations in downstream tasks. Using this NLP technique, you may PREDICT a WORD based on the other words in the sentence. The FOLLOWING is the process for Masked language modelling:
|
|
| 5. |
What do you mean by Autoencoders? |
|
Answer» A network that is used for learning a vector representation of the input in a COMPRESSED form, is called an autoencoder. It is a TYPE of unsupervised learning since labels aren’t needed for the process. This is mainly used to learn the mapping function from the input. In order to MAKE the mapping useful, the input is reconstructed from the vector representation. After training is complete, the vector representation that we get helps encode the input TEXT as a dense vector. Autoencoders are generally used to make FEATURE representations. In the figure below, the hidden layer depicts a compressed representation of the source data that captures its essence. The input representation is reconstructed by the output layer called the decoder. |
|
| 6. |
Explain the pipeline for Information extraction (IE) in NLP. |
|
Answer» In comparison to text classification, the typical PIPELINE for IE necessitates more fine-grained NLP processing. For example, we'd need to know the part-of-speech tags of words to identify named entities (people, organisations, etc.). We WOULD require COREFERENCE resolution to connect various references to the same entity (e.g., Albert Einstein, Einstein, the scientist, he, etc.). It's worth noting that none of these stages are required for creating a text classification system. As a result, IE is a more NLP-intensive operation than text categorization. Not all steps in the pipeline are required for all IE jobs, as shown in the diagram, and the figure shows which IE tasks necessitate which degrees of analysis. Other than named entity recognition, all other IE tasks require deeper NLP pre-processing followed by models developed for those specific tasks. Key phrase extraction is the TASK that requires the least amount of NLP processing (some algorithms also do POS tagging before extracting key phrases), WHEREAS all other IE tasks require deeper NLP pre-processing followed by models developed for those specific tasks. Standard evaluation sets are often used to assess IE tasks in terms of precision, recall, and F1 scores. Because of the various levels of NLP pre-processing required, the accuracy of these processing steps has an impact on IE jobs. All of these factors should be considered when collecting relevant training data and, if necessary, training our own models for IE. |
|
| 7. |
What are some metrics on which NLP models are evaluated? |
|
Answer» The following are some metrics on which NLP models are evaluated:
|
|
| 8. |
What is the difference between NLP and NLU? |
||||||||||
Answer»
|
|||||||||||
| 9. |
What is Latent Semantic Indexing (LSI) in NLP? |
|
Answer» Latent Semantic Indexing (LSI), ALSO known as Latent Semantic Analysis, is a mathematical method for improving the accuracy of information retrieval. It AIDS in the discovery of hidden(latent) RELATIONSHIPS between words (semantics) by generating a set of various concepts associated with the terms of a phrase in order to increase information comprehension. Singular value decomposition is the NLP technique utilised for this aim. It's best for working with small GROUPS of static documents. |
|
| 10. |
What do you mean by Parts of Speech (POS) tagging in NLP? |
|
Answer» A Part-Of-Speech Tagger (POS Tagger) reads the text in a language and ASSIGNS parts of speech to each word (and other tokens), such as noun, verb, adjective, and so on. To LABEL terms in text bodies, PoS taggers employ an ALGORITHM. With tags like "noun-plural" or even more complicated labels, these taggers create more COMPLEX categories than those stated as basic PoS. |
|
| 11. |
What do you mean by a Bag of Words (BOW)? |
|
Answer» The Bag of WORDS model is a popular ONE that uses word frequency or occurrences to train a classifier. This methodology generates a matrix of occurrences for documents or phrases, regardless of their grammatical structure or word ORDER. A bag-of-words is a text representation that describes the frequency with which words appear in a document. It entails two steps:
Because any information about the sequence or structure of words in the document is deleted, it is REFERRED to as a "bag" of words. The model simply cares about whether or not recognised terms appear in the document, not where they appear. |
|
| 12. |
Explain how parsing is done in NLP. |
|
Answer» Parsing is the PROCESS of identifying and understanding a text's syntactic structure. It is accomplished by examining the text's CONSTITUENT pieces. The machine parses each word one by one, then two by two, three by three, and so on. It's a unigram when the system parses the text one word at a time. A bigram is a text that is PARSED two words at a time. When the machine parses three words at a time, the SET of words is called a trigram. The following points will help us comprehend the importance of parsing in NLP:
|
|
| 13. |
What are the steps to follow when building a text classification system? |
|
Answer» When creating a text classification system, the following steps are usually followed:
|
|
| 14. |
What do you mean by TF-IDF in Natural language Processing? |
|
Answer» TF-IDF also called Term Frequency-Inverse Document Frequency helps us get the IMPORTANCE of a PARTICULAR word relative to other words in the corpus. It's a common scoring metric in information retrieval (IR) and summarization. TF-IDF converts words into vectors and adds SEMANTIC information, RESULTING in weighted unusual words that MAY be utilised in a variety of NLP applications. |
|
| 15. |
What is an ensemble method in NLP? |
|
Answer» An ENSEMBLE approach is a methodology that DERIVES an output or makes predictions by combining numerous independent similar or distinct models/weak learners. An ensemble can also be CREATED by combining various models such as random forest, SVM, and logistic regression. Bias, variance, and NOISE, as we all know, have a negative impact on the mistakes and predictions of any machine learning model. Ensemble approaches are employed to overcome these DRAWBACKS. |
|
| 16. |
Explain the concept of Feature Engineering. |
|
Answer» After a VARIETY of pre-processing procedures and their applications, we need a WAY to input the pre-processed TEXT into an NLP algorithm later when we employ ML methods to complete our modelling step. The set of strategies that will achieve this goal is referred to as feature engineering. Feature extraction is another name for it. The purpose of feature engineering is to convert the text's qualities into a numeric vector that NLP ALGORITHMS can understand. This stage is CALLED "text representation". |
|
| 17. |
What is the meaning of Text Normalization in NLP? |
|
Answer» Consider a situation in which we’re operating with a set of social media posts to find information events. Social media textual content may be very exceptional from the language we’d SEE in, say, newspapers. A phrase may be spelt in multiple ways, such as in shortened forms, (for INSTANCE, with and without hyphens), names are USUALLY in lowercase, and so on. When we're developing NLP TOOLS to work with such kinds of data, it’s beneficial to attain a CANONICAL representation of textual content that captures these kinds of variations into one representation. This is referred to as text normalization. Converting all text to lowercase or uppercase, converting digits to text (e.g., 7 to seven), expanding abbreviations, and so on are some frequent text normalisation stages. |
|