1.

What Are The Most Memory-intensive Operations?

Answer»

If a query fails with an error indicating "memory limit exceeded", you might suspect a memory leak. The problem could actually be a query that is structured in a way that causes Impala to allocate more memory than you expect, exceeded the memory allocated for Impala on a particular node. Some examples of query or table structures that are especially memory-intensive are:

  • INSERT statements using dynamic partitioning, into a table with many different partitions. (Particularly for tables using Parquet format, where the data for each partition is held in memory until it reaches the full block size in size before it is written to disk.) CONSIDER BREAKING up such operations into several different INSERT statements, for example to load data one year at a time rather than for all years at once.
  • GROUP BY on a unique or high-cardinality column. Impala allocates some handler structures for each different value in a GROUP BY query. Having millions of different GROUP BY values could exceed the memory limit.
  • Queries involving very wide tables, with thousands of columns, particularly with many STRING columns. Because Impala ALLOWS a STRING value to be up to 32 KB, the intermediate RESULTS during such queries could require substantial memory allocation.

If a query fails with an error indicating "memory limit exceeded", you might suspect a memory leak. The problem could actually be a query that is structured in a way that causes Impala to allocate more memory than you expect, exceeded the memory allocated for Impala on a particular node. Some examples of query or table structures that are especially memory-intensive are:



Discussion

No Comment Found