This section includes 7 InterviewSolutions, each offering curated multiple-choice questions to sharpen your Current Affairs knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
Explain Reliability And Failure Handling In Apache Flume? |
|
Answer» FLUME NG uses channel-based transactions to guarantee reliable message delivery. When a message moves from one agent to another, two transactions are started, one on the agent that delivers the EVENT and the other on the agent that receives the event. In ORDER for the sending agent to commit it’s transaction, it must receive SUCCESS indication from the receiving agent. The receiving agent only returns a success indication if it’s own transaction commits properly first. This ensures guaranteed delivery semantics between the hops that the flow makes. Figure below shows a sequence diagram that ILLUSTRATES the relative scope and duration of the transactions operating within the two interacting agents. Flume NG uses channel-based transactions to guarantee reliable message delivery. When a message moves from one agent to another, two transactions are started, one on the agent that delivers the event and the other on the agent that receives the event. In order for the sending agent to commit it’s transaction, it must receive success indication from the receiving agent. The receiving agent only returns a success indication if it’s own transaction commits properly first. This ensures guaranteed delivery semantics between the hops that the flow makes. Figure below shows a sequence diagram that illustrates the relative scope and duration of the transactions operating within the two interacting agents. |
|
| 2. |
What Are The Similarities And Differences Between Apache Flume And Apache Kafka? |
|
Answer» Flume pushes MESSAGES to their destination via its Sinks.With Kafka you need to consume messages from Kafka BROKER using a Kafka CONSUMER API. Flume pushes messages to their destination via its Sinks.With Kafka you need to consume messages from Kafka Broker using a Kafka Consumer API. |
|
| 3. |
Can You Explain About Configuration Files? |
|
Answer» The agent configuration is stored in local configuration FILE. it COMPRISES of each AGENTS source, sink and CHANNEL information. The agent configuration is stored in local configuration file. it comprises of each agents source, sink and channel information. |
|
| 4. |
Can Flume Can Distribute Data To Multiple Destinations? |
|
Answer» Yes. It SUPPORT multiplexing flow. The event FLOWS from one SOURCE to multiple CHANNEL and multiple destionations, It is acheived by defining a flow multiplexer. Yes. It support multiplexing flow. The event flows from one source to multiple channel and multiple destionations, It is acheived by defining a flow multiplexer. |
|
| 5. |
How Multi-hop Agent Can Be Setup In Flume? |
|
Answer» Avro RPC Bridge MECHANISM is used to SETUP Multi-hop AGENT in Apache Flume. Avro RPC Bridge mechanism is used to setup Multi-hop agent in Apache Flume. |
|
| 6. |
What Is Sink Processors? |
|
Answer» Sinc PROCESSORS is MECHANISM by which you can CREATE a fail-over task and LOAD BALANCING. Sinc processors is mechanism by which you can create a fail-over task and load balancing. |
|
| 7. |
What Is Apache Spark? |
|
Answer» Spark is a fast, easy-to-use and flexible data processing framework. It has an advanced EXECUTION engine SUPPORTING cyclic data flow and in-memory computing. Spark can run on Hadoop, standalone or in the cloud and is capable of accessing diverse data sources INCLUDING HDFS, HBASE, CASSANDRA and others. Spark is a fast, easy-to-use and flexible data processing framework. It has an advanced execution engine supporting cyclic data flow and in-memory computing. Spark can run on Hadoop, standalone or in the cloud and is capable of accessing diverse data sources including HDFS, HBase, Cassandra and others. |
|
| 8. |
What Is Flume Event? |
|
Answer» A unit of data with SET of string attribute called FLUME event. The EXTERNAL source like web-server send events to the source. Internally Flume has inbuilt functionality to understand the source format. Each log FILE is consider as an event. Each event has header and value sectors, which has header information and appropriate value that assign to articular header. A unit of data with set of string attribute called Flume event. The external source like web-server send events to the source. Internally Flume has inbuilt functionality to understand the source format. Each log file is consider as an event. Each event has header and value sectors, which has header information and appropriate value that assign to articular header. |
|
| 9. |
Why Flume.? |
|
Answer» Flume is not limited to collect logs from DISTRIBUTED systems, but it is capable of performing other use cases such as
Flume is not limited to collect logs from distributed systems, but it is capable of performing other use cases such as |
|
| 10. |
What Are Interceptors? |
|
Answer» Interceptors are used to FILTER the events between source and CHANNEL, channel and sink. These channels can filter un-necessary or TARGETED log files. DEPENDS on requirements you can USE n number of interceptors. Interceptors are used to filter the events between source and channel, channel and sink. These channels can filter un-necessary or targeted log files. Depends on requirements you can use n number of interceptors. |
|
| 11. |
Tell Any Two Features Of Flume? |
|
Answer» Fume collects data efficiently, aggregate and moves large AMOUNT of log data from many different SOURCES to centralized data store. Fume collects data efficiently, aggregate and moves large amount of log data from many different sources to centralized data store. |
|
| 12. |
Does Flume Provide 100% Reliability To The Data Flow? |
|
Answer» YES, Apache Flume PROVIDES end to end reliability because of its TRANSACTIONAL approach in DATA FLOW. Yes, Apache Flume provides end to end reliability because of its transactional approach in data flow. |
|
| 13. |
What Are The Data Extraction Tools In Hadoop? |
|
Answer» SQOOP can be used to transfer data between RDBMS and HDFS. Flume can be used to extract the streaming data from SOCIAL media, web log ETC and STORE it on HDFS. Sqoop can be used to transfer data between RDBMS and HDFS. Flume can be used to extract the streaming data from social media, web log etc and store it on HDFS. |
|
| 14. |
What Are Flume Core Components ? |
|
Answer» Cource, Channels and sink are core components in Apache Flume. When Flume source receives event from externalsource, it stores the event in one or MULTIPLE channels. Flume CHANNEL is temporarily store and keep the event until’s consumed by the Flume sink. It act as Flume repository. Flume Sink REMOVES the event from channel and put into an external repository like HDFS or MOVE to the NEXT flume. Cource, Channels and sink are core components in Apache Flume. When Flume source receives event from externalsource, it stores the event in one or multiple channels. Flume channel is temporarily store and keep the event until’s consumed by the Flume sink. It act as Flume repository. Flume Sink removes the event from channel and put into an external repository like HDFS or Move to the next flume. |
|
| 15. |
What Are The Complicated Steps In Flume Configurations? |
|
Answer» Flume can PROCESSING streaming data. so if started once, there is no stop/end to the process. asynchronously it can flows data from source to HDFS via agent. First of all agent should know individual components how they are CONNECTED to load data. so configuration is trigger to load streaming data. for example consumerkey, consumersecret accessToken and accessTokenSecret are key factor to download data from twitter. Flume can processing streaming data. so if started once, there is no stop/end to the process. asynchronously it can flows data from source to HDFS via agent. First of all agent should know individual components how they are connected to load data. so configuration is trigger to load streaming data. for example consumerkey, consumersecret accessToken and accessTokenSecret are key factor to download data from twitter. |
|
| 16. |
Explain What Are The Tools Used In Big Data? |
|
Answer» Tools used in BIG DATA INCLUDES Tools used in Big Data includes |
|
| 17. |
What Is Flumeng? |
|
Answer» A real time loader for STREAMING your data into HADOOP. It STORES data in HDFS and HBase. You’ll want to get started with FlumeNG, which improves on the ORIGINAL FLUME. A real time loader for streaming your data into Hadoop. It stores data in HDFS and HBase. You’ll want to get started with FlumeNG, which improves on the original flume. |
|
| 18. |
Why We Are Using Flume? |
|
Answer» Most often Hadoop DEVELOPER USE this too to get data from social MEDIA sites. Its developed by Cloudera for aggregating and moving very large AMOUNT if data. The primary use is to gather log FILES from different sources and asynchronously persist in the hadoop cluster. Most often Hadoop developer use this too to get data from social media sites. Its developed by Cloudera for aggregating and moving very large amount if data. The primary use is to gather log files from different sources and asynchronously persist in the hadoop cluster. |
|
| 19. |
Differentiate Between Filesink And Filerollsink? |
|
Answer» The MAJOR difference between HDFS FileSink and FileRollSink is that HDFS File Sink WRITES the events into the Hadoop Distributed File SYSTEM (HDFS) whereas File Roll Sink STORES the events into the local file system. The major difference between HDFS FileSink and FileRollSink is that HDFS File Sink writes the events into the Hadoop Distributed File System (HDFS) whereas File Roll Sink stores the events into the local file system. |
|
| 20. |
Does Apache Flume Support Third-party Plugins? |
|
Answer» Yes, Flume has 100% plugin-based ARCHITECTURE, it can load and ships data from external SOURCES to external DESTINATION which SEPARATELY from Flume. SO that most of the big data analysis use this tool for streaming data. Yes, Flume has 100% plugin-based architecture, it can load and ships data from external sources to external destination which separately from Flume. SO that most of the big data analysis use this tool for streaming data. |
|
| 21. |
Does Apache Flume Provide Support For Third Party Plug-ins? |
|
Answer» Most of the DATA ANALYSTS use Apache Flume has plug-in based architecture as it can load data from external SOURCES and TRANSFER it to external destinations. Most of the data analysts use Apache Flume has plug-in based architecture as it can load data from external sources and transfer it to external destinations. |
|
| 22. |
Explain About The Replication And Multiplexing Selectors In Flume? |
|
Answer» Channel Selectors are used to handle multiple channels. Based on the Flume header VALUE, an event can be written just to a single channel or to multiple channels. If a channel selector is not specified to the source then by DEFAULT it is the Replicating selector. USING the replicating selector, the same event is written to all the channels in the source’s channels list. MULTIPLEXING channel selector is used when the application has to send different events to different channels. Channel Selectors are used to handle multiple channels. Based on the Flume header value, an event can be written just to a single channel or to multiple channels. If a channel selector is not specified to the source then by default it is the Replicating selector. Using the replicating selector, the same event is written to all the channels in the source’s channels list. Multiplexing channel selector is used when the application has to send different events to different channels. |
|
| 23. |
Explain About The Different Channel Types In Flume. Which Channel Type Is Faster? |
|
Answer» The 3 different built in channel TYPES available in Flume are:
MEMORY Channel is the fastest channel among the THREE however has the risk of data loss. The channel that you choose completely depends on the nature of the big data application and the value of each event. The 3 different built in channel types available in Flume are: MEMORY Channel is the fastest channel among the three however has the risk of data loss. The channel that you choose completely depends on the nature of the big data application and the value of each event. |
|
| 24. |
What Is A Channel? |
|
Answer» It STORES events,events are delivered to the CHANNEL via sources OPERATING within the agent.An event stays in the channel until a SINK removes it for further TRANSPORT. It stores events,events are delivered to the channel via sources operating within the agent.An event stays in the channel until a sink removes it for further transport. |
|
| 25. |
Is It Possible To Leverage Real Time Analysis On The Big Data Collected By Flume Directly? If Yes, Then Explain How? |
|
Answer» Data from FLUME can be extracted, transformed and loaded in real-time into APACHE Solr SERVERS usingMorphlineSolrSink. Data from Flume can be extracted, transformed and loaded in real-time into Apache Solr servers usingMorphlineSolrSink. |
|
| 26. |
What Is An Agent? |
|
Answer» A process that hosts FLUME components such as sources, channels and sinks, and thus has the ability to RECEIVE, store and FORWARD events to their DESTINATION. A process that hosts flume components such as sources, channels and sinks, and thus has the ability to receive, store and forward events to their destination. |
|
| 27. |
How Can Flume Be Used With Hbase? |
|
Answer» Apache Flume can be used with HBase using one of the two HBase links:
Working of the HBaseSink: In HBaseSink, a Flume Event is converted into HBase Increments or Puts. SERIALIZER implements the HBaseEventSerializer which is then instantiated when the sink starts. For every event, sink calls the initialize method in the serializer which then translates the Flume Event into HBase increments and puts to be sent to HBase CLUSTER. Working of the AsyncHBaseSink: AsyncHBaseSink implements the AsyncHBaseEventSerializer. The initialize method is called only once by the sink when it starts. Sink invokes the setEvent method and then makes calls to the getIncrements and getActions methods just similar to HBase sink. When the sink STOPS, the cleanUp method is called by the serializer.
Apache Flume can be used with HBase using one of the two HBase links: Working of the HBaseSink: In HBaseSink, a Flume Event is converted into HBase Increments or Puts. Serializer implements the HBaseEventSerializer which is then instantiated when the sink starts. For every event, sink calls the initialize method in the serializer which then translates the Flume Event into HBase increments and puts to be sent to HBase cluster. Working of the AsyncHBaseSink: AsyncHBaseSink implements the AsyncHBaseEventSerializer. The initialize method is called only once by the sink when it starts. Sink invokes the setEvent method and then makes calls to the getIncrements and getActions methods just similar to HBase sink. When the sink stops, the cleanUp method is called by the serializer.
|
|
| 28. |
Which Is The Reliable Channel In Flume To Ensure That There Is No Data Loss? |
|
Answer» FILE CHANNEL is the most RELIABLE channel AMONG the 3 channels JDBC, FILE and MEMORY. FILE Channel is the most reliable channel among the 3 channels JDBC, FILE and MEMORY. |
|
| 29. |
What Is Apache Flume? |
|
Answer» Apache Flume is a DISTRIBUTED, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log DATA from many different sources to a centralized data SOURCE. Review this Flume use case to learn how Mozilla collects and Analyse the Logs using Flume and Hive. Flume is a framework for populating Hadoop with data. Agents are populated throughout ones IT infrastructure – INSIDE web servers, application servers and mobile devices, for example – to COLLECT data and integrate it into Hadoop. Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data source. Review this Flume use case to learn how Mozilla collects and Analyse the Logs using Flume and Hive. Flume is a framework for populating Hadoop with data. Agents are populated throughout ones IT infrastructure – inside web servers, application servers and mobile devices, for example – to collect data and integrate it into Hadoop. |
|
| 30. |
What Is Flume? |
|
Answer» Flume is a distributed, RELIABLE, and available service for efficiently collecting, aggregating, and moving large AMOUNTS of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault TOLERANT with tunable reliability mechanisms and MANY fail over and recovery mechanisms. It uses a simple extensible data model that allows for online analytic APPLICATION. Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many fail over and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application. |
|