Dataframe is structured into named and column and provides a same behaviour which is in table in RDBMS
| Dataset is distributed collection of data, which provide the benefits of both RDD and dataframe
|
Dataframe doesn’t require schema or meta information about the and does not process strict type checking.
| To create dataset we need to provide the schema information about the record and follows strict type checking.
|
Dataframe doesn’t allow lambda function
| Dataset support support lambda function.
|
Dataframe doesn’t comes with optimize engine
| Dataset comes with Spark SQL optimize engine called Catalyst optimizer
|
Dataframe doesn’t support any encoding technique at runtime
| Dataset comes with encoder technique, which provide technique to convert JVM object into the dataset.
|
Incompatible with domain object, once dataframe created, we can’t regenerate the domain object.
| Regeneration of domain object is possible, because dataset need the schema information before creating the
|
Dataframe doesn’t support the compile TIME safety.
| Dataset maintain the schema information, if schema is incorrect than its generate the exception at compile time.
|
Once dataframe GET created, we can’t PERFORM any RDD operation on it.
| Dataset leverage to use RDD operation as WELL along with sql query processor.
|