|
Answer» In Impala 1.2 and higher, there is much LESS need to use the REFRESH and INVALIDATE METADATA statements:
- The new impala-catalog service, represented by the catalogd daemon, broadcasts the results of Impala DDL statements to all Impala nodes. Thus, if you do a CREATE TABLE STATEMENT in Impala while CONNECTED to one node, you do not need to do INVALIDATE METADATA before issuing queries through a different node.
- The catalog service only RECOGNIZES changes made through Impala, so you must still issue a REFRESHstatement if you load data through Hive or by manipulating files in HDFS, and you must issue an INVALIDATE METADATA statement if you create a table, alter a table, add or drop partitions, or do other DDL statements in Hive.
- Because the catalog service broadcasts the results of REFRESH and INVALIDATE METADATA statements to all nodes, in the cases where you do still need to issue those statements, you can do that on a single node rather than on every node, and the changes will be automatically recognized across the cluster, making it more convenient to load balance by issuing queries through arbitrary Impala nodes rather than always using the same COORDINATOR node.
In Impala 1.2 and higher, there is much less need to use the REFRESH and INVALIDATE METADATA statements:
|