Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

1.

While importing data from different sources, can the pandas library recognize dates?

Answer»

YES, they can, but with some bit of help. We need to add the parse_dates argument while we are reading data from the sources. Consider an example where we read data from a CSV file, we may encounter DIFFERENT date-time formats that are not readable by the pandas library. In this CASE, pandas provide flexibility to build our custom date parser with the help of lambda functions as shown below:

import pandas as pdfrom datetime import datetimedateparser = lambda date_val: datetime.strptime(date_val, '%Y-%m-%d %H:%M:%S')df = pd.read_csv("some_file.csv", parse_dates=['datetime_column'], date_parser=dateparser)
2.

How will you get the items that are not common to both the given series A and B?

Answer»

We can achieve this by FIRST performing the union of both series, then taking the intersection of both series. Then we follow the approach of GETTING items of union that are not there in the list of the intersection.

The FOLLOWING code demonstrates this:

import pandas as pdimport numpy as npdf1 = pd.Series([2, 4, 5, 8, 10])df2 = pd.Series([8, 10, 13, 15, 17])p_union = pd.Series(np.union1d(df1, df2)) # union of seriesp_intersect = pd.Series(np.intersect1d(df1, df2)) # intersection of seriesunique_elements = p_union[~p_union.isin(p_intersect)]print(unique_elements)"""Output:0 21 42 55 136 157 17dtype: int64"""
3.

Can you get items of series A that are not available in another series B?

Answer»

This can be achieved by using the ~ (not/negation symbol) and isin() method as SHOWN below.

import pandas as pddf1 = pd.Series([2, 4, 8, 10, 12])df2 = pd.Series([8, 12, 10, 15, 16])DF1=df1[~df1.isin(df2)]print(df1)"""Output:0 21 4dtype: int64"""
4.

How will you delete indices, rows and columns from a dataframe?

Answer»

To delete an Index:

  • Execute del df.index.name for removing the index by name.
  • Alternatively, the df.index.name can be assigned to None.
  • For example, if you have the below dataframe:
Column 1 NAMES John 1 Jack 2 Judy 3 Jim 4
  • To drop the index name “Names”:
df.index.name = None# Or run the below:# del df.index.nameprint(df) Column 1John 1Jack 2Judy 3Jim 4

To delete row/column from dataframe:

  • drop() METHOD is used to delete row/column from dataframe.
  • The axis argument is passed to the drop method where if the value is 0, it indicates to drop/delete a row and if 1 it has to drop the column.
  • Additionally, we can try to delete the rows/columns in place by SETTING the value of inplace to True. This makes sure that the job is done without the need for reassignment.
  • The duplicate values from the row/column can be deleted by USING the drop_duplicates() method.
5.

How to add new column to pandas dataframe?

Answer»

A new COLUMN can be added to a pandas DATAFRAME as follows:

import pandas as pd data_info = {'first' : pd.Series([1, 2, 3], index=['a', 'b', 'C']), 'second' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(data_info) #To ADD new column thirddf['third']=pd.Series([10,20,30],index=['a','b','c']) print (df) #To add new column fourthdf['fourth']=df['first']+info['third'] print (df)
6.

What do you understand by reindexing in pandas?

Answer»

Reindexing is the process of conforming a dataframe to a new index with optional filling logic. If the VALUES are missing in the previous index, then NaN/NA is placed in the location. A new object is returned unless a new index is produced that is equivalent to the current one. The copy value is set to False. This is ALSO used for CHANGING the index of ROWS and columns in the dataframe.

7.

How will you identify and deal with missing values in a dataframe?

Answer»

We can identify if a DATAFRAME has missing values by using the isnull() and isna() methods.

missing_data_count=DF.isnull().sum()

We can handle missing values by EITHER replacing the values in the column with 0 as follows:

df[‘column_name’].fillna(0)

Or by replacing it with the MEAN VALUE of the column

df[‘column_name’] = df[‘column_name’].fillna((df[‘column_name’].mean()))
8.

Can you create a series from the dictionary object in pandas?

Answer»

One DIMENSIONAL ARRAY capable of storing different data types is called a series. We can create pandas series from a dictionary object as shown below:

import pandas as pd dict_info = {'key1' : 2.0, 'key2' : 3.1, 'key3' : 2.2} series_obj = pd.Series(dict_info) print (series_obj) Output:x 2.0y 3.1z 2.2dtype: float64

If an index is not specified in the input method, then the keys of the dictionaries are SORTED in ascending ORDER for constructing the index. In case the index is PASSED, then values of the index label will be extracted from the dictionary.

9.

How will you combine different pandas dataframes?

Answer»

The dataframes can be combines using the below approaches:

  • append() method: This is used to stack the dataframes horizontally. Syntax:
df1.append(df2)
  • concat() method: This is used to stack dataframes vertically. This is BEST used when the dataframes have the same columns and SIMILAR fields. Syntax:
pd.concat([df1, df2])
  • join() method: This is used for extracting DATA from VARIOUS dataframes having one or more common columns.
df1.join(df2)
10.

Define pandas dataframe.

Answer»

A dataframe is a 2D mutable and tabular structure for representing data labelled with axes - rows and columns.
The syntax for CREATING dataframe:

import PANDAS as pddataframe = pd.DataFrame( data, index, columns, dtype)

where:

  • data - Represents various forms like series, MAP, ndarray, lists, dict etc.
  • index - Optional ARGUMENT that represents an index to row labels.
  • columns - Optional argument for COLUMN labels.
  • Dtype - the data type of each column. Again optional.
11.

What do you know about pandas?

Answer»
  • Pandas is an open-source, python-based library used in data manipulation applications REQUIRING high performance. The name is DERIVED from “PANEL Data” having multidimensional data. This was DEVELOPED in 2008 by Wes McKinney and was developed for data analysis.
  • Pandas are useful in performing 5 MAJOR steps of data analysis - Load the data, clean/manipulate it, prepare it, model it, and analyze the data.