27 + Interview Questions in Unstructured Data Classification in DataBase Page 1 InterviewSolution

1.	Select the correct statement about Nonlinear classification.
Answer» Select the correct statement about Nonlinear CLASSIFICATION. Choose the correct option from below list (1)The concept of slack variables is used in SVM for Nonlinear classification (2)Kernel TRICK is used in SVM for non-linear classification (3)Kernel tricks are used by Nonlinear classifiers to achieve maximum-margin hyperplanes Answer:-(2)Kernel trick is used in SVM for non-linear classification

Discussion

2.	A classifier that can compute using numeric as well as categorical values is __________
Answer» A classifier that can compute using numeric as well as categorical values is __________ Choose the CORRECT option from below list (1)Naive Bayes Classifier (2)Decision Tree Classifier (3)SVM Classifier (4)Random Forest Classifier Answer:-(4)Random Forest Classifier

Discussion

3.	An algorithm that counts how many times a word appears in a document is __________
Answer» An algorithm that counts how many times a word appears in a DOCUMENT is __________ CHOOSE the correct OPTION from below list (1)Bag-of-Words (BOW) (2)TF-IDF (3)TDM (4)DTM Answer:-(1)Bag-of-Words (BOW)

Discussion

4.	The following are pre-processing methods used for unstructured data classification, except _________
Answer» The following are pre processing methods used for unstructured data CLASSIFICATION, except _________ CHOOSE the CORRECT option from below list (1)Confusion_matrix (2)Stop word removal (3)Stemming (4)Lemmatization Answer:-(1)Confusion_matrix

Discussion

5.	TF and IDF use matrix representations.
Answer» TF and IDF use MATRIX representations. Choose the CORRECT option from below list (1)False (2)TRUE Answer:-(2)True

Discussion

6.	Can we consider sentiment classification as a text classification problem
Answer» Can we consider sentiment CLASSIFICATION as a text classification problem Choose the correct option from below LIST (1)Yes (2)No Answer:-(1)Yes

Discussion

7.	Inverse Document frequency is used in the term-document matrix.
Answer» Inverse DOCUMENT frequency is used in the term document matrix. Choose the correct OPTION from below list (1)True (2)FALSE Answer:-(2)False

Discussion

8.	Identify the stop word(s) from the following.
Answer» IDENTIFY the STOP WORD(s) from the following. Choose the correct option from below list (1)Both "the" and "it" (2)"the" (3)"fragment" (4)"it" (5)"computer" Answer:-(1)Both "the" and "it"

Discussion

9.	Which NLP technique uses a lexical knowledge base to obtain the correct base form of the words?
Answer» Which NLP TECHNIQUE uses a lexical knowledge base to obtain the correct base form of the words? Choose the correct option from below list (1)lemmatization (2)TOKENIZATION (3)OBJECT standardization (4)stop word removal Answer:-(1)lemmatization

Discussion

10.	The classification where each data is mapped to more than one class is called ___________
Answer» The classification where each DATA is mapped to more than one class is called ___________ Choose the CORRECT option from below list (1)MULTI Class Classification (2)Binary Classification (3)Multi LABEL Classification Answer:-(3)Multi Label Classification

Discussion

11.	High classification accuracy always indicates a good classifier.
Answer» HIGH classification accuracy always INDICATES a good classifier. Choose the correct option from below list (1)FALSE (2)True Answer:-(1)False

Discussion

12.	Pruning is a technique associated with __________
Answer» PRUNING is a technique associated with __________ Choose the correct OPTION from below list (1)SVM (2)Decision tree (3)Logistic regression (4)Linear regression Answer:-(2)Decision tree

Discussion

13.	Choose the correct sequence for classifier building from the following.
Answer» Choose the correct sequence for classifier BUILDING from the following. Choose the correct option from below LIST (1)None of the options (2)INITIALIZE -> Evaluate -> Train -> Predict (3)Initialize -> Train -> Predict -> Evaluate (4)Train -> Test -> Initialize -> Predict Answer:-(3)Initialize -> Train -> Predict -> Evaluate

Discussion

14.	A technique used to depict the performance in a tabular form that has 2 dimensions namely actual and predicted sets of data is ___________
Answer» A TECHNIQUE used to depict the performance in a tabular form that has 2 dimensions namely actual and predicted sets of data is ___________ Choose the correct option from below LIST (1)Confusion Matrix (2)CLASSIFICATION Accuracy (3)CROSS VALIDATION (4)Classification Report Answer:-(1)Confusion Matrix

Discussion

15.	In document classification, each document has to be converted from full text to a document vector.
Answer» In document CLASSIFICATION, each document has to be converted from full TEXT to a document vector. Choose the correct option from below LIST (1)True (2)False Answer:-(1)True

Discussion

16.	Which numerical statistics is used to identify the importance of a rare word in a document?
Answer» Which numerical statistics is used to IDENTIFY the IMPORTANCE of a rare word in a document? Choose the correct OPTION from below list (1)None of the options (2)TF-IDF (3)DF (4)TF Answer:-(2)TF-IDF

Discussion

17.	Choose the correct sequence from the following in unstructured data classification.
Answer» CHOOSE the correct SEQUENCE from the following in UNSTRUCTURED data classification. Choose the correct option from below list (1)Data Analysis -> Pre-Processing -> PREDICT -> Train (2)Pre-Processing -> Model Building -> Predict (3)Data Analysis -> Pre-Processing -> Model Building -> Predict (4)Pre-Processing -> Predict -> Train Answer:-(3)Data Analysis -> Pre-Processing -> Model Building -> Predict

Discussion

18.	Email spam data is an example of __________
Answer» EMAIL spam data is an example of __________ Choose the correct option from below list (1)Unstructured data (2)STRUCTURED data Answer:-(1)Unstructured data

Discussion

19.	SVM is a _____________
Answer» SVM is a _____________ Choose the CORRECT option from below list (1)Supervised LEARNING algorithm (2)Semi-supervised learning algorithm (3)Unsupervised learning algorithm (4)Weakly supervised learning algorithm Answer:-(1)Supervised learning algorithm

Discussion

20.	Clustering is supervised classification.
Answer» Clustering is SUPERVISED classification. Choose the CORRECT option from below list (1)TRUE (2)False Answer:-(2)False

Discussion

21.	What is the purpose of lemmatization?
Answer» What is the PURPOSE of LEMMATIZATION? Choose the CORRECT option from below list (1)To convert a sentence into words (2)To convert words into a proper base form (3)To remove redundant words (4)To split into sentences Answer:-(2)To convert words into a proper base form

Discussion

22.	The most widely used package for machine learning in Python is _________
Answer» The most widely used package for MACHINE learning in PYTHON is _________ CHOOSE the correct option from below LIST (1)jango (2)pillow (3)bottle (4)sklearn Answer:-(4)sklearn

Discussion

23.	The higher value of which of the following hyperparameters is better for the decision tree algorithm?
Answer» The higher value of which of the following HYPERPARAMETERS is BETTER for the decision tree algorithm? Choose the CORRECT option from below list (1)Cannot say (2)Samples for leaf (3)Depth of tree (4)Number of samples used for split Answer:-(1)Cannot say

Discussion

24.	True Positive is when the predicted instance and the actual instance are not negative.
Answer» TRUE Positive is when the predicted instance and the actual instance are not negative. Choose the correct option from below list (1)True (2)False Answer:-(1)True

Discussion

25.	What kind of classification is our case study 'Spam Detection'?
Answer» What kind of classification is our case study "Spam Detection"? Choose the CORRECT option from below list (1)MULTI class (2)BINARY (3)Multi label Answer:-(2)Binary

Discussion

26.	Which of the given hyperparameters, when increased, may cause the random forest to overfit the data?
Answer» Which of the given hyperparameters, when increased, may CAUSE the random forest to overfit the DATA? Choose the correct option from below list (1)DEPTH of Tree (2)Learning Rate (3)NUMBER of Trees Answer:-(1)Depth of Tree

Discussion

27.	Which pre-processing technique is used to remove the most commonly used words?
Answer» Which PRE processing technique is used to remove the most commonly used WORDS? CHOOSE the correct option from below list (1)Lemmatization (2)Tokenization (3)Stopword removal Answer:-(3)Stopword removal

Discussion

Explore topic-wise InterviewSolutions in .

Select the correct statement about Nonlinear classification.

A classifier that can compute using numeric as well as categorical values is __________

An algorithm that counts how many times a word appears in a document is __________

The following are pre-processing methods used for unstructured data classification, except _________

TF and IDF use matrix representations.

Can we consider sentiment classification as a text classification problem

Inverse Document frequency is used in the term-document matrix.

Identify the stop word(s) from the following.

Which NLP technique uses a lexical knowledge base to obtain the correct base form of the words?

The classification where each data is mapped to more than one class is called ___________

High classification accuracy always indicates a good classifier.

Pruning is a technique associated with __________

Choose the correct sequence for classifier building from the following.

A technique used to depict the performance in a tabular form that has 2 dimensions namely actual and predicted sets of data is ___________

In document classification, each document has to be converted from full text to a document vector.

Which numerical statistics is used to identify the importance of a rare word in a document?

Choose the correct sequence from the following in unstructured data classification.

Email spam data is an example of __________

SVM is a _____________

Clustering is supervised classification.

What is the purpose of lemmatization?

The most widely used package for machine learning in Python is _________

The higher value of which of the following hyperparameters is better for the decision tree algorithm?

True Positive is when the predicted instance and the actual instance are not negative.

What kind of classification is our case study 'Spam Detection'?

Which of the given hyperparameters, when increased, may cause the random forest to overfit the data?

Which pre-processing technique is used to remove the most commonly used words?