1.

How dataset will be a better alternative as compare to the dataframe?

Answer»
DataframeDataset
Dataframe is structured into named and column and provides a same behaviour which is in  table in RDBMS
Dataset is distributed collection of data, which provide the benefits of both RDD and dataframe
Dataframe doesn’t require schema or meta information about the and does  not process strict type checking.
To create dataset we need to provide the schema information about the record and follows strict type checking.
Dataframe doesn’t allow lambda function
Dataset support support lambda function.
Dataframe doesn’t comes with optimize engine
Dataset comes with Spark SQL optimize engine called Catalyst optimizer
Dataframe doesn’t support any encoding technique at runtime
Dataset comes with encoder technique, which provide technique to convert JVM object into the dataset.
Incompatible with domain object, once dataframe created, we can’t regenerate the domain object.
Regeneration of domain object is possible, because dataset need the schema information before creating the
Dataframe doesn’t support the compile TIME safety.
Dataset maintain the schema information, if schema is incorrect than its generate the exception at compile time.
Once dataframe GET created, we can’t PERFORM any RDD operation on it.
Dataset leverage to use RDD operation as WELL along with sql query processor.


Discussion

No Comment Found