InterviewSolution
| 1. |
Explain the importance of Distributed cache in Hadoop? |
|
Answer» In Hadoop, DISTRIBUTED cache is a utility provided by the MapReduce FRAMEWORK. BRIEFLY we can that, it can cache files like jar files, archives and TEXT files when needed for any application. When MapReduce job is running, this utility caches the read only files and make them AVAILABLE to all the DataNodes. Each DataNodes gets the local copy of the file. Thus, we will be able to access all files present in DataNodes. These files remain in the DataNodes while job is running and these are deleted once the job is completed. The default size of Distributed cache is 10 GB which can be adjusted according to the requirement using local.cache.size. |
|