Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

1.

legendTest

Answer»

Response:

Correct answer is Option b – bwplot()

Bwplot() is the Box and Whisker plot used for numerical VARIABLES. This is part of lattice PACKAGE in R.

Below is an example of a box and whisker plot using the singer dataset.

library(lattice)

require(stats)

#bwplot bwplot(voice.part ~ height, DATA=singer, xlab="Height (inches)") plot() is used for generic x-y plotting. xyplot() PRODUCES bivariate scatterplots or time-series plots. #xyplot ## Tonga Trench Earthquakes Depth <- equal.count(quakes$depth, number=8, overlap=.1) xyplot(lat ~ LONG | Depth, data = quakes)

dotplot() produces Cleveland dot plots.

2.

legendTest &lt;- ggplot(data=PlantGrowth, aes(x=group, y=weight, fill=group)) + geom_boxplot()

Answer»

Response:

The correct answer is c and a. Both “table” and “xtabs” can be used to accomplish this.

“Table” is the one that uses cross-specifying factors to build a contingency table of the counts at each combination of factor levels.

Xtabs also CREATES a contingency table(optionally a sparse matrix) from cross-classifying factors, usually contained in a data frame, using a formula interface.

List is used as a function to CONSTRUCT, coerce and check for both kinds of R lists.

Stem produces a stem and LEAF PLOT of the values. It is used for a different purpose than what is requested here. It uses parameter such as “scale” that can be used to expand the scale of the plot.

3.

library(ggplot2)

Answer»

Response:

The answer is Option A.

Yes, trend lines can be ADDED into the plot in R.

Below is an example where we have added a vertical line as the mean of the variable for determining the threshold into the histogram plot that we have plotted using the iris DATASET in R.

The ggplot2 library in R is leveraged for this purpose.

library(ggplot2)

GGPLOT(data = iris,aes(x=Sepal.Length))+geom_histogram(fill="lightblue",col="blue")+geom_vline(xintercept = mean(iris$Sepal.Length),color="red",linetype="longdash")

The function geom_vline where the line stands for the vertical line is used. Here we just need to provide the intercept on x-axis only. The mean of Sepal.Length parameter is taken as a threshold to determine where the line has to be drawn. The TYPE of the line can also be determined as shown by using the parameter “linetype”.

4.

Please consider built-in “PlantGrowth” dataset in R. Goal is to remove the legend which is shown in the box plot below (legend for a group with 3 values). Select all correct options that can be used to remove the legend in the boxplot.

Answer»

Response:

OPTIONS a, b, c, d is all correct. All of these can be used to remove the legend.

We use legendTest + guides(fill=FALSE) to remove legend for a particular aesthetic. This can also be possible in OPTION b which is using the scale_fill_discrete() function when SPECIFYING the scale.

The third option in option c which is legendTest + theme(legend.position="none") will remove all legends in the plot.

Option d also has SIMILAR syntax format as in option a which will enable to remove the legend.

5.

What is chartjunk? Explain three common types of chartjunk.

Answer»

RESPONSE:

Chartjunk refers to visual elements in charts, plots, graphs etc that are not required to present in the pictorial representation, or something that distracts the viewer from the information.

Professor Edward Tufte has coined this by mentioning this as “Style over substance”. i.e. the interior decoration of graphics GENERATES a lot of INK that does not tell the viewer anything new. Below are a few examples of chartjunk.

Three common types of chartjunk are as follows:

  • Unintentional optical art
  • The dreaded grid
  • The self-promoting graphical duck

Example of unintentional optical art can be shown as per the example below.

These are nothing but illusions and unwanted effects rather than conveying what should be ideally conveyed.

Example of the dreaded grid can be shown as per the example below.

If we look at it – gridlines convey no information, dark gridlines are chartjunk. If gridlines are needed, they should be light grey.

Why do we create chartjunk – primarily because of the following ASPECTS:

  • Lack of quantitative skills of professional artists
  • The belief that statistical data are boring
  • Graphics are only for the UNSOPHISTICATED reader
6.

What type of charts are to be considered when we are trying to demonstrate “relationship” between variables/parameters?

Answer»

RESPONSE:

When we are trying to SHOWRELATIONSHIP” between two variables, we will use a SCATTER plot or chart. When we are trying to show “relationship” between three variables, we will have to use a bubble chart. An illustration is shown below.

“Relationship between two variables” – scatter chart:

“Relationship between three variables” – bubble chart:

7.

Can plots be exported as image files or other file formats in R? Explain briefly.

Answer»

Response:

We could easily save our plots as images directly from R using an editor such as RStudio. This way of saving, however, does not provide much flexibility. If we want to customize our images, we need to have an approach as to how to export plots from the R code itself.

We can use “ggsave” function to accomplish this.

We can save the plots in different formats such as jpeg, tiff, pdf, svg etc. We can also use various parameters to change the SIZE of the image PRIOR to exporting it or saving it in a path or location.

# Saving as jpeg format

ggsave(filename = “PlotName1.jpeg”, plot=Image_plot )

# Saving as tiff format

ggsave(filename = “PlotName1.tiff”, plot=Image_plot )

# Saving as pdf format

ggsave(filename = “PlotName1.pdf”, plot=Image_plot )

# Saving as tiff format with change in size

ggsave(filename = “PlotName1.tiff”, plot=Image_plot , width=14, height=10, UNITS=”CM”)
8.

What is a time series plot? Explain using an example.

Answer»

A time-series is a plot where all the MEASUREMENTS are plotted sequentially. Time here is represented ALONG the x-axis while the VARIABLE of interest is a plot on the y-axis. For many data, among which environmental observations, taking a look at their temporal pattern may be extremely useful for gaining insight into their behaviour.

In many cases, the variable time is underestimated. However, time-series are extremely useful to DETERMINE the temporal pattern of a variable.

We take an example of sample dataset called “nottem” in R which captures average monthly temperatures at Nottingham, between 1920 to 1939.

str(nottem) head(nottem) plot(nottem)

The chart shows x1 (which is the average TEMPERATURE of the city) over a period of time for around 19-20 years.

9.

When will you use a histogram and when will you use a bar chart in R? Explain with an example by leveraging R package.

Answer»

We use a histogram to plot the distribution of a continuous VARIABLE, while we can use a bar CHART to plot the distribution of a categorical variable.

Let us take the example of IRIS dataset in R.

We will plot a histogram of IRIS dataset with leveraging “ggplot2” package in R. “Sepal.Length” is a continuous variable which is PLOTTED below onto the x-axis.

Code:

ggplot(data = iris,aes(x=Sepal.Length))+geom_histogram(fill="lightblue",col="blue")

We will plot a bar chart of IRIS dataset with leveraging “ggplot2” package in R. “Species” is a categorical variable which is plotted below onto the x-axis.

Code:

ggplot(data = iris,aes(x=Species))+geom_bar(fill="SKYBLUE")
10.

What is a scatter plot? Explain with an example of how to create one scatter plot using R libraries.

Answer»

A scatter PLOT is a chart used to plot a correlation between two or more variables at the same time. We can consider the EXAMPLE of IRIS dataset in R USING GGPLOT2 library.

# Example of ScatterPlot library(ggplot2)

ggplot(iris,aes(y=Sepal.Length,x=Petal.Length))+geom_point() Sample output:

This shows a COMPARISON between Sepal. Length and Petal.Length in the IRIS dataset leveraging R ggplot2 library.

11.

Provide 3 differences between ggplot2 and lattice packages?

Answer»
Ggplot2 package
Lattice package
It uses COUNTS, not percentages by
default.

It PLOTS the facets starting from top-left.
It plots the facets starting from the bottom-left.
Ggplot2 orders facets in the opposite
direction COMPARED to that in lattice

Sorting each facet SEPARATELY is not possible in ggplot2

12.

What is lattice package in R used for? Explain with an example.

Answer»

Lattice is a powerful and high-level data visualization system inspired by trellis graphics for R. This is used with an emphasis to deal with multivariate data. This is CONTRIBUTED by a person named Deepayan Sarkar.

We can take the mtcars dataset (car dataset with parameters such as mileage, WEIGHT, number of gears, number of cylinders etc.) for demonstrating some sample visualizations leveraging this package.

Density plot and SCATTER plot matrix can be DRAWN by leveraging this library.

# kernel density plot densityplot(~mpg,

main="Density Plot", xlab="Miles per Gallon")

# scatterplot matrix splom(mtcars[c(1,3,4,5,6)],main="MTCARS Data")

13.

How to make multiple plots on to a single page layout in R? Explain with an example. 

Answer»

It is simple and easy to create multiple plots onto a SINGLE page using R. The following SYNTAX can be used to capture a 2 X 2 plot in a single page.

par(mfrow=c(2,2))

For example, if we want to display HISTOGRAM charts for IRIS dataset for various sepal and petal width and lengths, then each of the below COMMANDS will display one of the histogram charts on one page using R.

hist(iris$Sepal.Length) hist(iris$Sepal.Width) hist(iris$Petal.Length) hist(iris$Petal.Width)

Now if we use the command par(mfrow=c(2,2)) and then execute about code for plotting histogram, then four charts are displayed in a 2 X 2 format (2 rows with 2 columns). A sample representation of the result is shown in the below diagram.

Similarly, 3X3 representation can be displayed using SOMETHING like this - par(mfrow=c(3,3)) and so on.

14.

List down at least 5 libraries in R that can be used for data visualization. Explain three of them briefly.

Answer»

Following “libraries/packages in R” are typically used for data visualization purposes and also quite useful with their usage and features.

ggplot2, LATTICE, Leaflet, Highcharter, RColorBrewer, plotly, sunburstR, RGL, dygraphs

Out of the above “ggplot2” is extremely popular and some of the sources indicate that this is one of the highest downloaded packages by users for the purpose of data visualization/graphics using R packages.

  • ggplot2 – is an implementation of the grammar of graphics and can be used for custom plots using R. While it is simple to CREATE standard plots or charts in R, ggplot2 is used to build “custom” plots in a simple manner which are difficult to create without the usage of this library. We can use this library to build plots in a systematic fashion – i.e. create our plot with axes, then go on to add POINTS, then go on to add a line, then add some statistical inference metric such as confidence interval, then highlight a regression curve with some mathematical equation in the background and so on.
  • RColorBrewer – is a library on COLOUR brewer palettes. It provides colour schemes for maps. It can be used to manipulate colours in plots/charts, graphs, maps etc. This is designed by Cynthia Brewer. It can be used along with “plotly” package as well.
  • Leaflet – is basically used for maps. We can create interactive maps leveraging this. The interface for a leaflet in R is using the “htmlwidgets” framework. Hence it can be managed in markdown documents easily and also in shiny UI applications.