InterviewSolution
| 1. |
What are Event Time and Stateful processing in Streaming? |
|
Answer» Event Time : At a higher level, in stream-processing systems, there are effectively two relevant times for each event: the time at which it actually OCCURRED (event time) and the time that it was processed or reached the stream-processing system (processing time). Event time: Event time is the time that is embedded in the data itself. It is most often, though not required to be, the time that an event actually occurs. This is IMPORTANT to use because it provides a more robust way of comparing EVENTS against one another. The challenge here is that event data can be late or out of order. This means that the stream processing system must be able to handle out-of-order or late data. Processing time: Processing time is the time at which the stream-processing system actually receives data. This is usually less important than event time because when it is processed, is largely an implementation detail. This can not ever be out of order because it is a property of the streaming system at a certain time. Stateful Processing : Stateful processing is only necessary when you need to use or update intermediate information (state) over longer periods of time (in either a micro-batch or a record-at-a-time approach). This can happen when you are using event time or when you are performing aggregation on a key, whether that involves event time or not. For the most part, when we are performing stateful operations, Spark handles all of this complexity for US. For example, when you specify a grouping, Structured Streaming maintains and updates the information for you. You simply specify the logic. When performing a stateful operation, Spark stores the intermediate information in a state store. Spark’s current state store implementation is an in-memory state store that is made fault tolerant by storing intermediate state to the CHECKPOINT directory. |
|