Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

1.

Point out the correct statement.(a) gsub is used for fixing character vectors(b) sub is used for finding values like grep(c) grep is used for fixing character vectors(d) none of the mentionedThis question was addressed to me in quiz.Query is from Regular Expressions and Text Variables in section Getting Data of Data Science

Answer»

Correct ANSWER is (a) gsub is USED for FIXING character vectors

Explanation: sub and gsub is used for fixing character vectors.

2.

httr package does not work well with facebook and twitter API.(a) True(b) FalseThe question was posed to me during an online exam.Question is taken from Reading from Web and APIs in portion Getting Data of Data Science

Answer»

Right option is (B) False

The explanation: Most MODERN APIs use SOMETHING like OAUTH.

3.

Which of the following function is used to read data off the webpages?(a) read.web(b) read.Lines(c) read.Line(d) all of the mentionedI had been asked this question in quiz.Query is from Reading from Web and APIs in portion Getting Data of Data Science

Answer»

The correct ANSWER is (b) read.Lines

Easy EXPLANATION - read.Lines FUNCTION will EXTRACT the web page data.

4.

Point out the wrong statement.(a) Variables with character values should be made less descriptive(b) Variables with character values should usually be made into factor variable(c) Common variables are used to apply transforms(d) All of the mentionedThe question was posed to me in an international level competition.I'm obligated to ask this question of Regular Expressions and Text Variables in division Getting Data of Data Science

Answer»

Correct answer is (a) Variables with CHARACTER values should be made LESS DESCRIPTIVE

The best I can explain: Variables with character values should be made more descriptive.

5.

Which of the following function gives information about top level data?(a) head(b) tail(c) summary(d) none of the mentionedThe question was posed to me in an international level competition.My question comes from Summarizing and Merging Data in chapter Getting Data of Data Science

Answer»

Right answer is (a) HEAD

The EXPLANATION: The function head is very useful for working with lists, TABLES, data frames and EVEN functions.

6.

Point out the correct statement.(a) XLConnect package has more options for manipulating access files(b) XLConnect vignette package can also be used for manipulating excel files(c) write.xlsx write out an excel file with different argument(d) None of the mentionedI had been asked this question by my school principal while I was bunking the class.The above asked question is from Reading from Web and APIs topic in division Getting Data of Data Science

Answer»

Right ANSWER is (C) WRITE.xlsx write out an excel FILE with different argument

The BEST explanation: write.xlsx write out an excel file with similar argument.

7.

Which of the following function is used for loading flat files?(a) read.data(b) read.sheet(c) read.table(d) none of the mentionedI got this question in homework.This question is from Reading from Web and APIs topic in division Getting Data of Data Science

Answer»

Right CHOICE is (c) read.table

Best EXPLANATION: This reads DATA in to the RAM.

8.

tidyr is a reframing of _______ designed to accompany the tidy data framework.(a) reshape5(b) dplyr(c) reshape2(d) all of the mentionedI have been asked this question by my school principal while I was bunking the class.This intriguing question originated from Tidy Data in section Getting Data of Data Science

Answer»

Right OPTION is (c) RESHAPE2

Best EXPLANATION: TIDYR does less REFRAMING than reshape2.

9.

Point out the correct statement.(a) Primary data is original source of data(b) Secondary data is original source of data(c) Questions are obtained after data processing steps(d) None of the MentionedThis question was addressed to me by my college professor while I was bunking the class.I need to ask this question from Raw and Processed Data in section Getting Data of Data Science

Answer»

The CORRECT CHOICE is (a) Primary data is ORIGINAL source of data

The explanation: Primary data is ALSO referred to as RAW data.

10.

Which of the following function is used for fixing character vectors?(a) tolower(b) toUPPER(c) toLOWER(d) all of the mentionedThe question was posed to me during an interview.I would like to ask this question from Regular Expressions and Text Variables in section Getting Data of Data Science

Answer» RIGHT choice is (a) tolower

Easy EXPLANATION - It TRANSLATES CHARACTER to lowercase.
11.

Which of the following function is used for quantiles of quantitative values?(a) quantile(b) quantity(c) quantiles(d) all of the mentionedThe question was asked in an interview.My question comes from Summarizing and Merging Data in chapter Getting Data of Data Science

Answer»

The correct answer is (a) quantile

To explain: In probability and STATISTICS, the quantile FUNCTION specifies, for a given probability in the probability DISTRIBUTION of a random variable, the value at which the probability of the random variable will be LESS than or equal to that probability.

12.

Point out the correct statement.(a) head function work on string(b) tail function work on string(c) head function work on string but tail function do not(d) none of the mentionedThe question was posed to me in a job interview.The query is from Summarizing and Merging Data topic in section Getting Data of Data Science

Answer»

Correct option is (d) none of the mentioned

The BEST EXPLANATION: Both head and tail FUNCTION do not work on STRINGS.

13.

Which of the following is used to extract data from HTML code of websites?(a) Webscraping(b) Webdredging(c) Webcleaning(d) All of the mentionedI have been asked this question in an interview for internship.The origin of the question is Reading from Web and APIs in division Getting Data of Data Science

Answer»

The correct ANSWER is (a) WEBSCRAPING

To explain I would SAY: Webscraping is a great way to get DATA.

14.

Strange binary file generated from machines is an example of tidy data.(a) True(b) FalseI have been asked this question during an online interview.I need to ask this question from Tidy Data in chapter Getting Data of Data Science

Answer» CORRECT OPTION is (b) False

Explanation: Data sets STORED in spreadsheets, such as Microsoft’s Excel, are BINARY, not raw ASCII data files.
15.

Which type of data is generated by POS terminal in a busy supermarket each day?(a) Source(b) Processed(c) Synchronized(d) All of the mentionedI have been asked this question during an interview.My enquiry is from Raw and Processed Data in chapter Getting Data of Data Science

Answer»

Correct choice is (a) SOURCE

The best EXPLANATION: RAW data is SOMETIMES referred to as source data.

16.

Processing data includes subsetting, formatting and merging only.(a) True(b) FalseI got this question at a job interview.The above asked question is from Raw and Processed Data topic in section Getting Data of Data Science

Answer»

Right answer is (b) False

The explanation: There are many other TECHNIQUES APPLIED to RAW data.

17.

Which of the following join is by default used in plyr package?(a) left(b) right(c) full(d) all of the mentionedThis question was posed to me in an international level competition.The origin of the question is Summarizing and Merging Data in section Getting Data of Data Science

Answer»

The CORRECT option is (a) left

Best explanation: Join is FASTER in PLYR PACKAGE.

18.

Point out the wrong statement.(a) Common variables are used to create missingness vector(b) Common variables are used to cutting up quantitative variables(c) Common variables are not used to apply transforms(d) All of the mentionedI had been asked this question during an interview.My query is from Summarizing and Merging Data topic in section Getting Data of Data Science

Answer» CORRECT CHOICE is (c) Common variables are not USED to APPLY transforms

The EXPLANATION is: Common variables are not used to apply transforms.
19.

Which of the following package is used for reading JSON data?(a) jsonlite(b) json(c) jsondata(d) all of the mentionedI have been asked this question in class test.Origin of the question is Reading from Web and APIs topic in section Getting Data of Data Science

Answer» CORRECT ANSWER is (a) jsonlite

To explain I would say: The jsonlite PACKAGE is a JSON generator optimized for the web.
20.

Which of the following is an important parameter of read.table function?(a) file(b) header(c) sep(d) all of the mentionedThis question was addressed to me in an interview for internship.This interesting question is from Reading from Web and APIs in division Getting Data of Data Science

Answer» CORRECT ANSWER is (d) all of the mentioned

Explanation: More PARAMETERS are REQUIRED for LOADING the data.
21.

Which of the following function is used for searching text strings by means of regular expression?(a) grepd(b) grepl(c) gepexpr(d) all of the mentionedThe question was posed to me during an interview.I want to ask this question from Regular Expressions and Text Variables topic in section Getting Data of Data Science

Answer» RIGHT answer is (b) grepl

The best I can explain: GREP, grepl, regexpr, gregexpr and regexec SEARCH for matches to argument pattern WITHIN each element of a character VECTOR.
22.

Each observation forms a column in tidy data.(a) True(b) FalseThis question was addressed to me during an online interview.Question is from Summarizing and Merging Data in section Getting Data of Data Science

Answer» RIGHT option is (b) False

Easiest EXPLANATION - Each VARIABLE forms a COLUMN in tidy data.
23.

Point out the correct statement.(a) Data has only qualitative value(b) Data has only quantitative value(c) Data has both qualitative and quantitative value(d) None of the mentionedThis question was addressed to me in an international level competition.Question is taken from Raw and Processed Data in section Getting Data of Data Science

Answer» RIGHT choice is (a) Data has only QUALITATIVE value

The EXPLANATION: Data belongs to the SET of items.
24.

Which of the following metacharacter is used to refer to any character?(a) %(b) @(c) .(d) All of the mentionedThe question was asked in semester exam.I need to ask this question from Regular Expressions and Text Variables topic in division Getting Data of Data Science

Answer»

The correct option is (c) .

To explain I would say: A dot in FUNCTION NAME can mean any of the following: NOTHING at all; a SEPARATOR between METHOD and class in S3 method.

25.

Which of the following request can be issued from httr package?(a) GET(b) PUT(c) DELETE(d) All of the mentionedThe question was asked in a national level competition.My question comes from Reading from Web and APIs topic in portion Getting Data of Data Science

Answer» RIGHT CHOICE is (d) All of the mentioned

The explanation is: Authentication is necessary for issuing a REQUEST.
26.

Which of the following is the most common problem with messy data?(a) Column headers are values(b) Variables are stored in both rows and columns(c) A single observational unit is stored in multiple tables(d) All of the mentionedThe question was asked in my homework.Question is taken from Tidy Data in section Getting Data of Data Science

Answer»

Right ANSWER is (d) All of the mentioned

To explain I would say: Real datasets can, and often do, violate the THREE precepts of tidy data in ALMOST every WAY imaginable.

27.

Which of the following function is good for the automatic splitting of names?(a) split(b) strsplit(c) autsplit(d) none of the mentionedI had been asked this question in an interview.The origin of the question is Regular Expressions and Text Variables in section Getting Data of Data Science

Answer» CORRECT CHOICE is (b) strsplit

Best explanation: strsplit split a CHARACTER STRING or vector of character strings using a regular expression or a LITERAL string.
28.

Which of the following package is used for reading excel data?(a) xlsx(b) xlsc(c) read.sheet(d) all of the mentionedI got this question by my school teacher while I was bunking the class.This is a very interesting question from Reading from Web and APIs in division Getting Data of Data Science

Answer»

The CORRECT OPTION is (a) xlsx

Explanation: read.xlsx and read.xlsx FUNCTIONS are PART of xlsx PACKAGE.

29.

Which of the following is a trait of tidy data?(a) each variable in one column(b) each observation in different row(c) one table for each kind of variable(d) none of the mentionedI got this question during an internship interview.This intriguing question originated from Tidy Data in portion Getting Data of Data Science

Answer»

Correct CHOICE is (b) each observation in different row

To explain: The summary COULD be the sum of the observations, the NUMBER of occurrences, their mean VALUE, and so on.

30.

Which of the following block information is odd man out?(a) Subsetting(b) Raw data(c) Ready for analysis(d) None of the mentionedThe question was asked during an internship interview.My question is from Raw and Processed Data topic in division Getting Data of Data Science

Answer»

Correct ANSWER is (b) RAW data

For explanation: Characteristics mentioned in the DIAGRAM are traits of PROCESSED data.

31.

Which of the following function is used for casting data frames?(a) dcast(b) ucast(c) rcast(d) all of the mentionedThe question was asked in final exam.My doubt is from Summarizing and Merging Data in section Getting Data of Data Science

Answer» CORRECT option is (a) dcast

The EXPLANATION: USE acast or dcast depending on WHETHER you want vector/matrix/array output or data frame output.
32.

Which of the following can be used to view all the tables in memory?(a) tables(b) alltable(c) table(d) none of the mentionedI have been asked this question in an online interview.This interesting question is from Reading from Web and APIs in section Getting Data of Data Science

Answer»

The correct choice is (a) tables

The explanation is: The table function is a very basic, but essential, function to MASTER while PERFORMING INTERACTIVE data ANALYSES.

33.

Point out the wrong statement.(a) data.table inherits from data.frame(b) data.table is written in Java(c) data.table is faster at subsetting and updating data(d) none of the mentionedThis question was posed to me during a job interview.Question is from Reading from Web and APIs in portion Getting Data of Data Science

Answer»

Correct option is (B) data.table is written in Java

To explain: data.table is written in C.

34.

Data that summarize all observations in a category are called __________ data.(a) frequency(b) summarized(c) raw(d) none of the mentionedThis question was posed to me during an interview for a job.This question is from Raw and Processed Data topic in portion Getting Data of Data Science

Answer» CORRECT answer is (b) summarized

Easiest explanation - The SUMMARY could be the sum of the OBSERVATIONS, the NUMBER of occurrences, their MEAN value, and so on.
35.

Which of the following function is used for determining missing values?(a) any(b) all(c) is(d) all of the mentionedI got this question in homework.My doubt stems from Summarizing and Merging Data in portion Getting Data of Data Science

Answer»

Correct answer is (d) all of the mentioned

The explanation is: In R, MISSING values are represented by the SYMBOL NA.

36.

Which of the following is an example of tidy data?(a) complicated JSON from facebook API(b) complicated JSON from Twitter API(c) unformatted excel file(d) all of the mentionedI got this question by my school principal while I was bunking the class.Question is taken from Tidy Data topic in section Getting Data of Data Science

Answer» CORRECT OPTION is (d) all of the mentioned

To explain: TIDY data is obtained after PROCESSING script.
37.

Which of the following function programmatically extract parts of XML file?(a) XmlSApply(b) XmlApply(c) XmlSApplyData(d) All of the mentionedI had been asked this question in an online quiz.Question is from Reading from Web and APIs topic in portion Getting Data of Data Science

Answer» CORRECT answer is (a) XmlSApply

To EXPLAIN: xmlSApply are SIMPLE wrappers for tapply and lappy FUNCTIONS.
38.

Which of the following will set the character that represents missing value?(a) na.quote(b) na.strings(c) nrows(d) all of the mentionedThis question was addressed to me in class test.My query is from Reading from Web and APIs topic in division Getting Data of Data Science

Answer»

The correct answer is (B) na.strings

Best EXPLANATION: na.strings TAKES a CHARACTER VECTOR.

39.

Regular expressions can be thought of as a combination of literals and metacharacters.(a) True(b) FalseI got this question during an internship interview.My question is from Regular Expressions and Text Variables in chapter Getting Data of Data Science

Answer»

The CORRECT choice is (a) True

Best EXPLANATION: REGULAR EXPRESSIONS have rich SET of metacharacters.

40.

Which of the following transforms can be performed with data value?(a) log2(b) cos(c) log10(d) all of the mentionedThe question was posed to me by my school teacher while I was bunking the class.My query is from Summarizing and Merging Data in division Getting Data of Data Science

Answer»

Right choice is (d) all of the mentioned

The BEST I can EXPLAIN: Many common transforms can be APPLIED to the data with R.

41.

Which of the following package loads data from SPSS?(a) read.spss(SPSS)(b) read.oct(SPSS)(c) read.xpot(SPSS)(d) all of the mentionedThe question was posed to me in semester exam.My question is taken from Reading from Web and APIs topic in portion Getting Data of Data Science

Answer» RIGHT ANSWER is (a) read.spss(SPSS)

The explanation: SPSS is a comprehensive and flexible STATISTICAL analysis and data management solution.
42.

Which of the following package is used for tidy data?(a) tidyr(b) souryr(c) NumPy(d) all of the mentionedI have been asked this question in my homework.My doubt is from Tidy Data topic in portion Getting Data of Data Science

Answer» CORRECT choice is (a) TIDYR

For explanation: tidyr is used for tidy DATA with spread and gather FUNCTIONS.
43.

Which of the following package is used for reading HTML and XML data?(a) httr(b) http(c) httx(d) all of the mentionedI got this question by my college director while I was bunking the class.The origin of the question is Reading from Web and APIs topic in division Getting Data of Data Science

Answer»

Correct choice is (a) httr

The EXPLANATION is: httr CONTAINS tools for WORKING with URLS and HTTP.

44.

Point out the wrong statement.(a) hdf5 can be used to reading/writing from disc in Python(b) rhdf5 is an interface for hdf5 format(c) maximum size of an HDF5 dataset is fixed when it is created(d) all of the mentionedThis question was posed to me in exam.My query is from Reading from Web and APIs in division Getting Data of Data Science

Answer» RIGHT choice is (b) rhdf5 is an interface for HDF5 format

The EXPLANATION: hdf5 can be USED to reading/writing from disc in R.
45.

Which of the following process involves structuring datasets to facilitate analysis?(a) Data tidying(b) Data mining(c) Data booting(d) All of the mentionedThe question was posed to me in an internship interview.This key question is from Tidy Data topic in portion Getting Data of Data Science

Answer»

Right OPTION is (a) Data tidying

To EXPLAIN: The principles of TIDY data provide a standard WAY to organize data values within a dataset.

46.

Which of the following data is put into a formula to produce commonly accepted results?(a) Raw(b) Processed(c) Synchronized(d) All of the MentionedI had been asked this question in an internship interview.This question is from Raw and Processed Data in section Getting Data of Data Science

Answer» RIGHT CHOICE is (b) Processed

To EXPLAIN: Raw DATA came from DIRECT measurements.
47.

Which of the following is an example of raw data?(a) original swath files generated from a sonar system(b) initial time-series file of temperature values(c) a real-time GPS-encoded navigation file(d) all of the mentionedThis question was posed to me during an online interview.I would like to ask this question from Raw and Processed Data in section Getting Data of Data Science

Answer»

Correct answer is (d) all of the mentioned

The EXPLANATION: Raw data refers to data that have not been CHANGED SINCE acquisition.

48.

Which of the following signs are used to indicate repetition?(a) #(b) *(c) –(d) All of the mentionedThis question was addressed to me during an internship interview.My question is based upon Regular Expressions and Text Variables topic in chapter Getting Data of Data Science

Answer»

Right ANSWER is (B) *

To explain I WOULD SAY: * and + are metacharacters for REPETITION of data.

49.

Point out the wrong statement.(a) Tidy datasets are all alike but every messy dataset is messy in its own way(b) Most statistical datasets are data frames made up of rows and columns(c) Tidy datasets provide a standardized way to link the structure of a dataset with its semantics(d) None of the mentionedThe question was posed to me in a job interview.The query is from Tidy Data topic in division Getting Data of Data Science

Answer» CORRECT answer is (d) NONE of the mentioned

The best explanation: The TIDY data STANDARD has been designed to SIMPLIFY the development of data analysis tools that work well together.
50.

Which of the following is used for specifying character class with metacharacter?(a) [](b) {}(c) /+(d) All of the mentionedI have been asked this question by my school principal while I was bunking the class.My doubt is from Regular Expressions and Text Variables in portion Getting Data of Data Science

Answer»

The CORRECT CHOICE is (a) []

The BEST EXPLANATION: You can list set of characters to ACCEPT a given point in the match.