1.

What is SchemaRDD in Spark RDD?

Answer»

SchemaRDD is an RDD consisting of row objects that are wrappers around integer arrays or strings that has schema information regarding the data TYPE of each column. They were designed to ease the lives of developers while debugging the code and while running unit test cases on the SparkSQL modules. They represent the DESCRIPTION of the RDD which is SIMILAR to the schema of RELATIONAL databases. SchemaRDD also provides the basic functionalities of the COMMON RDDs along with some relational query interfaces of SparkSQL.

Consider an example. If you have an RDD named Person that represents a person’s data. Then SchemaRDD represents what data each row of Person RDD represents. If the Person has attributes like name and age, then they are represented in SchemaRDD.



Discussion

No Comment Found