1.

What are the profilers in PySpark?

Answer»

Custom PROFILERS are supported in PYSPARK. These are useful for building predictive models. Profilers are useful for data REVIEW to ensure that it is valid and can be used for consumption. When we require a custom PROFILER, it has to define some of the following methods:

  • profile: This PRODUCES a system profile of some sort.
  • stats: This returns collected stats of profiling.
  • dump: This dumps the profiles to a specified path.
  • add: This helps to add profile to existing accumulated profile. The profile class has to be selected at the time of SparkContext creation.
  • dump(id, path): This dumps a specific RDD id to the path given.


Discussion

No Comment Found