1.

On Which Hosts Does Impala Run?

Answer»

Cloudera strongly recommends running the impalad daemon on each DataNode for GOOD performance. Although this topology is not a hard requirement, if there are DATA blocks with no Impala DAEMONS running on any of the hosts containing replicas of those blocks, queries involving that data could be very inefficient. In that case, the data MUST be transmitted from one host to another for PROCESSING by "remote reads", a condition Impala normally tries to avoid. 

Cloudera strongly recommends running the impalad daemon on each DataNode for good performance. Although this topology is not a hard requirement, if there are data blocks with no Impala daemons running on any of the hosts containing replicas of those blocks, queries involving that data could be very inefficient. In that case, the data must be transmitted from one host to another for processing by "remote reads", a condition Impala normally tries to avoid. 



Discussion

No Comment Found