1.

What Are The Common Configurations In Apache Storm?

Answer»

There are a variety of configurations you can set PER topology. A list of all the configurations you can set can be found here. The ones prefixed with “TOPOLOGY” can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden).

Here are some common ones that are set for a topology:

1 Config.TOPOLOGY_WORKERS : This sets the number of worker processes to use to execute the topology. For example, if you set this to 25, there will be 25 Java processes across the cluster executing all the tasks. If you had a combined 150 parallelism across all components in the topology, each worker process will have 6 tasks running within it as threads.

2 Config.TOPOLOGY_ACKER_EXECUTORS : This sets the number of executors that will track tuple trees and detect when a spout tuple has been fully processed By not SETTING this VARIABLE or setting it as null, Storm will set the number of acker executors to be EQUAL to the number of workers configured for this topology. If this variable is set to 0, then Storm will immediately ack tuples as soon as they come off the spout, effectively disabling reliability.

3 Config.TOPOLOGY_MAX_SPOUT_PENDING : This sets the maximum number of spout tuples that can be pending on a single spout task at once (pending means the tuple has not been acked or failed YET). It is highly recommended you set this config to prevent queue explosion.

4 Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS : This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed. This value defaults to 30 seconds, which is sufficient for most topologies.

5 Config.TOPOLOGY_SERIALIZATIONS : You can register more serializers to Storm using this config so that you can use custom types within tuples.

There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found here. The ones prefixed with “TOPOLOGY” can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden).

Here are some common ones that are set for a topology:

1 Config.TOPOLOGY_WORKERS : This sets the number of worker processes to use to execute the topology. For example, if you set this to 25, there will be 25 Java processes across the cluster executing all the tasks. If you had a combined 150 parallelism across all components in the topology, each worker process will have 6 tasks running within it as threads.

2 Config.TOPOLOGY_ACKER_EXECUTORS : This sets the number of executors that will track tuple trees and detect when a spout tuple has been fully processed By not setting this variable or setting it as null, Storm will set the number of acker executors to be equal to the number of workers configured for this topology. If this variable is set to 0, then Storm will immediately ack tuples as soon as they come off the spout, effectively disabling reliability.

3 Config.TOPOLOGY_MAX_SPOUT_PENDING : This sets the maximum number of spout tuples that can be pending on a single spout task at once (pending means the tuple has not been acked or failed yet). It is highly recommended you set this config to prevent queue explosion.

4 Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS : This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed. This value defaults to 30 seconds, which is sufficient for most topologies.

5 Config.TOPOLOGY_SERIALIZATIONS : You can register more serializers to Storm using this config so that you can use custom types within tuples.



Discussion

No Comment Found