Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

101.

What are Kubernetes labels?

Answer»

In Kubernetes, LABELS are key or value pairs being attached to objects, such as pods. The labels are meant to be used for specifying the identifying attributes of the objects meaningful and RELEVANT for the users. They do not directly imply these semantics on to the core system. Labels are also used for organizing and the selection of subsets of the objects. It can be attached to objects at the time of CREATION and can be subsequently added and modified at any time.

102.

What is a Pod in Kubernetes?

Answer»

A Pod in Kubernetes is actually a GROUP of one or more CONTAINERS having shared storage/network SETTINGS and a standard specification for running the containers. It contains one or more APPLICATION containers that are relatively tightly coupled and are executed on the same physical or virtual machine.

103.

What is Kubernetes DNS?

Answer»

The DNS or Domain NAME SYSTEM in Kubernetes is a process used for associating various types of information, such as IP Addresses with easy to remember names. Kubernetes DNS is set up automatically in the clusters to provide a secure and lightweight mechanism for SERVICE DISCOVERY.

The in-built service discovery in Kubernetes DNS allows applications to FIND and communicate with each other very quickly.

104.

What is load balancing in Kubernetes?

Answer»

TECHNICALLY, Load BALANCING is the procedure for efficiently distributing any network's traffic among multiple backend SERVICES. It is a very critical strategy for maximizing scalability and availability THROUGHOUT the network. In Kubernetes, there are various choices for executing the load balancing of external traffic to PODS.

105.

What is a Kubernetes endpoint?

Answer»

An Endpoint in KUBERNETES is BASICALLY the list of addresses, both IP and PORT) of the endpoints that implement a Kubernetes SERVICE.

106.

What is a headless service Kubernetes?

Answer»

A HEADLESS service in Kubernetes is one having a service IP returning the IPs of associated pods instead of PERFORMING load-balancing. This ALLOWS us to interact directly with pods instead of a PROXY.

107.

What is service and deployment in Kubernetes?

Answer»

A SERVICE in Kubernetes is defined as an abstraction that constitutes a logical set of pods running at a point in your cluster, providing the same functionality.

A deployment in Kubernetes is a RESOURCE object that provides a declarative update to the applications. A deployment ENABLES you to DESCRIBE an application's life cycle, like which images to use for the app, the number of pods to be present, and their WAY of getting updated.

13. What is the difference between POD and deployment in Kubernetes?
Kubernetes PodKubernetes Deployment
This is a collection of containers and basic objects. All the containers of a pod lie within the same node.This is a kind of controller in Kubernetes. All the containers. All the controllers provide a Pod Template for you to create one responsible for each of them.
It is not suitable for the production stage.It is best suited for the production stage.
It has no rollout or rollback updates.It has both rollout and rollback updates.
It does not monitor each stage of deployment.It monitors each stage of the pod.
108.

What is Kubernetes ClusterIP?

Answer»

A KUBERNETES ClusterIP is the unique IP address that is ASSIGNED to each Kubernetes SERVICE as SOON as they are created.

109.

What is Prometheus in Kubernetes?

Answer»

Prometheus is an open-source metrics collection system originally developed at Soundcloud, and more recently integrated into the CNCF. It is powerful due to its data model, a RICH set of client libraries functions, and its ability to create user-specific custom alerts or notifications based on data through metrics. Prometheus comes with its own standard dashboard which can be used to RUN ad-hoc queries or minor issues during debugging, but the most important aspect of Prometheus is utilized when integrating it with visualization backends.

Note: The above interview QUESTIONS on Kubernetes are a guide to EXPERIENCED, intermediate and beginner developers preparing for a job change.

110.

What is Kubernetes metrics server?

Answer»

Metrics SERVER is used as a cluster-wide aggregator of RESOURCE usage data. It is used to collect data and manage it among RESOURCES in a set of clusters. Its primary purpose is to collect metrics like CPU or memory consumption for CONTAINERS or NODES, from the Summary API, exposed by Kubelet on each node.

111.

What is Heapster in Kubernetes?

Answer»

Heapster is an open-source metrics collection SYSTEM that is used for performance monitoring and is COMPATIBLE with Kubernetes versions 1.0.6 and above. It ensures the collection of performance metrics about specific workloads, pods, nodes and containers. In addition, it also allows for the collection of metrics during events and other signals being GENERATED by your cluster. It helps and SUPPORTS multiple backends for persisting the DATA.

112.

What are the components of Kubernetes?

Answer»

COMPONENTS in Kubernetes are a vital process in a cluster that helps in the overall scalability and maintenance of an ENVIRONMENT. Here's an IMAGE of how different components in Kubernetes combine to perform efficiently.

Components FALL under two main CATEGORIES which consist of sub-categories:

1. Master Components:
  • Kube-apiserver
  • Etcd
  • Kube-scheduler
  • Kube-controller-manager
2. Node Components:
  • Kubelet
  • Kube-Proxy
  • Container Runtime
113.

What is Minikube in Kubernetes?

Answer»

Minikube is an open-source TOOL that enables a user to run Kubernetes on a laptop or other local machines. It can work with various OPERATING Systems like Linux, Mac, and Windows. It works as a single-node CLUSTER inside a virtual machine on your local machine. It is used in Kubernetes to evaluate important features and help scale the environment via effective maintenance and control.

Note: The above Kubernetes interview questions are a guide to CANDIDATES and may help in securing a relevant JOB.

114.

What is the use of Kubelet?

Answer»

A Kubelet is the lowest level component present in KUBERNETES. It is used for supporting and maintaining what’s running on EVERY individual MACHINE. It's like a process watcher focused on running a GIVEN set of containers.

115.

What is cAdvisor in Kubernetes?

Answer»

cAdvisor is an open-source resource usage COLLECTOR for containers in a CLUSTER. It is used in Kubernetes to support containers natively while operating on a per-node basis.

It auto-discovers all containers in a specific node and collects CPU, memory, filesystem, and network usage stats. cAdvisor ALSO provides USEFUL metrics like overall machine usage by analyzing the ‘root’ container on your machine.

116.

How many nodes are in a Kubernetes cluster?

Answer»

In KUBERNETES v1.17, up to 5000 NODES can be SUPPORTED WITHIN a cluster.

117.

What is the need for Container Orchestration?

Answer»

Container orchestration is basically managing the lifecycle of multiple containers, especially in large and/or dynamic ENVIRONMENTS. Developer Teams generally use it to CONTROL and automate multiple tasks such as:

  • Setting up and deployment of containers
  • Repetition and availability of containers
  • Editing, scaling or removing containers in such a way that load is spread evenly on the HOST architecture
  • Moving specific containers between clusters or from one host to another to maintain the balance of supply/demand in a host, or if a host expires
  • Allocating resources to containers as per requirement
  • Maintaining and controlling external exposure of services running in a container with the outside environment
  • Load balancing of service discovery between containers
  • Health monitoring of containers and hosts
  • Configuration of an application in accordance with the containers running in it

Note: The above Kubernetes INTERVIEW questions and answers are a guide to candidates preparing for a job change.

118.

Which is better Docker Swarm vs Kubernetes?

Answer»

Though both Docker Swarm and Kubernetes are wired to SAVE RESOURCES by LIMITING the use of hardware and leveraging cloud-based applications to meet the business resource requirement, there are some differences between them that call for a subjective analysis before going down ONE chosen road.

Docker SwarmKubernetes
It is a BIT advanced and requires skills of Docker CLIEasy to Install
No in-built tools, though there is a scope to download third-party apps.Have in-built tools for logging and performance monitoring.
Is advanced and has the potential to scale much higher than Kubernetes.It’s not as scalable as Docker Swarm, but it is known for its strength to maintain clusters.
119.

What is the Spark driver?

Answer»
120.

What is difference between client and cluster mode in Spark?

Answer»
121.

What are broadcast variables?

Answer» 28. What is CLUSTER MANAGER in SPARK?
122.

What is explode in spark SQL?

Answer»
123.

What is a sliding interval in Spark?

Answer»
124.

What is Spark checkpointing?

Answer»
125.

What is the difference between Spark and hive?

Answer»
126.

What is the use of accumulators in Spark?

Answer»
127.

What is spark catalyst?

Answer»
128.

How many types of RDD are there in Spark?

Answer»

There are two types of RDD OPERATIONS in Spark. They are:

  • Transformation: It is a type of FUNCTION in which a new RDD is created from an existing RDD.
  • Action: This is a type of function which is used when the user wants to WORK with an actual DATASET.
129.

How RDD can be created in Spark?

Answer»

RDDs or Resilient Distributed Datasets are the fundamental data structure present in Spark. They are immutable and fault-tolerant in nature. There are multiple ways to create RDDs in Spark. They are:

  • Creating RDD from a Seq or List USING Parallelize

           RDDs can be created by taking an existing collection from a driver’s program and passing it to the Spark’s SparkContext’s parallelize () method. Here’s an example:
      VAL rdd=spark.sparkContext.parallelize(Seq(("Java", 10000),
      ("Python", 200000), ("Scala", 4000)))
      rdd.foreach(PRINTLN)
       Output
      (Python,100000)
      (Scala,3000)
      (Java,20000)

  • Creating an RDD using a text file

     Mostly, in production systems, USERS can generate RDDs from files by simply reading the data from the files. Let us see how:
     Val rdd = spark.sparkContext.textFile("/path/textFile.txt")
     The above line of code creates an RDD in which each record represents a line of code.

  • Creating RDDs from Dataframes and DataSets

     You can easily convert any DataFrame or DataSet into an RDD. It can be done by using the rdd() method. Here’s how:
     val myRdd2 = spark.range(20).toDF().rdd
     In the above line of code, toDF() creates a DataFrame, and by calling an RDD, the range of code returns with a NEWLY created RDD.

130.

What is the difference between cache and persist in Spark?

Answer»
Cache ()PERSIST ()
While using this, the DEFAULT STORAGE level is MEMORY_ONLY for RDD and MEMORY_AND_DISK for DATASET.While using this, the user can use various storage LEVELS for both RDD and Dataset.
131.

Why spark uses lazy evaluation?

Answer»

SPARK uses Lazy Evaluation because of the following reasons:

  • It increases the manageability of the program by dividing it into SMALLER operations, thereby reducing the number of PASSES on the data by grouping operations.
  • It increases the speed and saves computational and calculational overhead by computing only necessary values.
  • It reduces complexities in any program by allowing USERS to work with an infinite data structure while drastically reducing time and space overheads.
  • It optimizes the program by reducing the number of queries being run.
132.

What is the difference between coalesce and repartition in Spark?

Answer»
CoalesceRepartition
It is used for definitely decreasing the number of partitions used in a Dataframe.This method can decrease or increase the number of partitions used in a Dataframe.
It USES the existing partitions to minimize the AMOUNT of data being shuffled in a Dataframe.It just creates NEW partitions and while doing a full shuffle.
The partitions through this method are of variable sizes.The partitions in this method are ROUGHLY the same sizes.
133.

What are the benefits of Spark over MapReduce?

Answer»

Here are some of the ADVANTAGES of using Spark rather than Hadoop’s MapReduce:

  • Spark is relatively easier to program and requires a lot less of actual coding than MapReduce
  • Spark has an in-built interactive mode, whereas MapReduce is, by DEFAULT, has only batch processing and does not have an in-built interactive mode.
  • Spark uses a data abstraction, RDD, to make the features more productive, whereas MapReduce does not have any concept
  • Spark executes batch processing jobs almost 10X to 100X times FASTER than MapReduce.
  • Spark is considered as a general-purpose cluster computing engine due to its various methods for data processing such as steaming, batch processing, and machine learning, whereas MapReduce only has a Batch Engine.
  • Spark consumes lower latency VIA PARTIAL or complete caching of results across various nodes whereas, MapReduce is disk-based and consumes a far higher latency.
134.

What is coalesce in Spark?

Answer»

In Spark, Coalesce is just another method for partitioning the data into a data FRAME. This is primarily used for REDUCING the number of partitions INSIDE a data frame. It is most commonly used in cases where the user WANTS to decrease the amount of partitions WITHOUT any confusion of shuffle.

135.

What is the difference between RDD and DataFrame?

Answer»
RDDDataframe
It is the representation of a SET of records and an immutable collection of objects within distributed computing.It is used for storing data and is basically the equivalent to a table in a relational database with more precious optimization.
This is an array of reference for PARTITIONED objects by representing a large set of data.It is a distributed collection of data in the form of named rows and COLUMNS
Here all the datasets are logically partitioned across servers to be computed across different nodes in a cluster.It has a matrix-like structure with different types of columns, such as numeric, logical, and so on.
This supports compile-time type safety, having been based on Object-Oriented Programming.If there is a non-existent column that the user tries to access, there is an attribute error but no scope for compile-time type safety.
Almost all data sources are supported by RDDDataframes require data sources to be in the JSON, CSV, or AVRO format, whereas storage systems having HIVE, HDFS, or MySQL tables.
136.

What happens if RDD partition is lost due to worker node failure?

Answer»

In Spark, if any partition of an RDD is lost due to the failure of a worker node, that partition can be re-computed USING the LINEAGE of operations from the original fault-tolerant dataset.

12. What is spark GraphX used for?

Here are the uses of GraphX in Spark:

  • It can be used for unifying ETL, exploratory analysis, and COMPUTATION of iterative graphs within a single system.
  • It can be used to present DATA in the form of graphs and collections while transforming and joining charts with RDD.
  • It can be used for writing custom iterative GRAPH algorithms with the help of Pregel API.
137.

What is catalyst Optimizer in Spark?

Answer»

The OPTIMIZER used by Spark SQL is the Catalyst optimizer. Its MAIN job is to optimize QUERIES that are WRITTEN in Spark SQL and DataFrame DSL. The Catalyst Optimizer runs queries much faster than its counterpart, RDD.

138.

What are the actions in Spark?

Answer»

In Spark, Actions are RDD’s operation WHOSE value RETURNS back to the spark driver programs which then kick off a job to be executed in a cluster. REDUCE, Collect, Take, saves Textfile are common examples of actions in APACHE Spark.

139.

What is difference between Hadoop and Spark?

Answer»
SparkHadoop
It's a Data Analytics EngineIt is a Big Data PROCESS Engine
Used to Process real-time data, using real-time EVENTS like Twitter and FacebookBATCH processing with a huge volume of data
Has a Low latency computingHas a High latency computing
Can process data extracted interactivelyProcess the data extracted in batch mode
It is easier to use, enables a user to process data using high-level OPERATORS through abstractionsHadoop's model is a bit COMPLEX, need to handle low-level APIs
Has an in-memory computation, thus, no external scheduler is requiredThe external job scheduler is required for memory computation
It is a bit less secure as compare to HadoopHighly secure
Costlier than HadoopLess Costly
140.

What is the use of spark streaming?

Answer»

Spark Streaming is an extension of the core Spark API.Its main use is to ALLOW data ENGINEERS and data scientists to PROCESS real-time data from multiple sources like Kafka, AMAZON Kinesis and Flume. This processed data can be exported to file systems, databases and dashboards for further analysis.

141.

What is PageRank algorithm in Spark?

Answer»

The PageRank ALGORITHM in Spark DELIVERS an output probability distribution which is USED to represent the chances of a person RANDOMLY clicking on links arriving on a particular PAGE.

142.

What is a parquet file in Spark?

Answer»

Parquet is a column-based file format which is used to optimize the speed of queries and is very EFFICIENT than a CSV or JSON file format. Spark SQL supports both read and write functions on parquet files which capture schema of original data AUTOMATICALLY

5. Why spark is faster than Hive?

Spark is faster than Hive because it does the processing of data in the main memory of worker nodes thus preventing UNNECESSARY I/O operations within DISKS.

143.

What is tungsten engine in Spark?

Answer»

TUNGSTEN is a CODENAME for the project in APACHE Spark whose main function is to make changes in the EXECUTION engine. Tungsten engine in Spark is used to exponentially increase the efficiency of memory and CPU for its native applications by pushing standard performance limits further as per hardware compatibility.

144.

What is Apache spark and what is it used for?

Answer»

Apache SPARK is an open-source general-purpose distributed data processing engine used to process and analyze large amounts of data efficiently. It has a wide array of uses in ETL and SQL batch jobs, processing of data from sensors, IOT Data MANAGEMENT, Financial Systems and MACHINE Learning Tasks.

145.

What are the features of Spark?

Answer»

Spark has the following important features which help developers in many ways:

  • Speed − It helps in the EFFICIENT workflow of a mobile application on the Hadoop cluster, having 100X memory speed and 10X Speed when running on a disk. By reducing the NUMBER of read/write operations on a disk and storing the intermediate processing data in memory, it saves valuable time.
  • Support multiple languages − Spark comes with in-built APIs written in Java, Scala, or Python. Having more than 80 high-level operators for interactive-querying, Spark helps developers easily code in multiple languages.
  • Advanced Analytics − Spark supports SQL queries, data STREAMING, Machine Learning, and Graphics algorithms along with total support for “MAP” and “Reduce” functionalities.
146.

Can you name some popular websites or application platforms which are using the Django framework?

Answer»

At present, Django has grown to be a powerful and robust FRAMEWORK which are utilized by a NUMBER of platforms. The websites which are using this framework to create HIGH TRAFFIC are-

  • Instagram
  • Mozilla
  • Spotify
  • NASA
  • Eventbrite
  • YouTube
147.

Tell us the steps to set static files in Django?

Answer»

The major STEPS to set up static files in Django are-

  • Set STATIC_ROOT in settings.py
  • Now RUN manage.py COLLECT static
  • Setup static files entry on pythonanywhere web tat.

This is one of the major Django interview QUESTIONS and ANSWERS.

148.

Describe the inheritance styles in Django?

Answer»

The INHERITANCE styles in Django are 3 in number-

  • Abstract base classes: This inheritance style is used by the developer when they want parents class to RETAIN the data which you do not want to type out for EVERY single child model.
  • Multi-table inheritance: This inheritance style is used if you want to subclass on an existing model and each of the MODELS to have its own database table.
  • Proxy models: This inheritance style allows the USER to modify the python level behavior without actually modifying the model's field.
149.

What does the Django template include?

Answer»

Template refers to a simple YET efficient text FILE which has the ability to create text-based formats, for example, XML, HTML and MANY more. Django template contains variables which can be replaced with VALUES when the template processes for EVALUATION and text which gets controlled by logic.

150.

Do you think Django is a content management system?

Answer»

No, Django is not a content MANAGEMENT SYSTEM rather it is a web development FRAMEWORK or a PROGRAMMING tool which helps the developers to build decent websites.