1.

What is a GraphX? How the RDD ineract to the graph DB and what are the operation can perform on Graph DB? Syntax of creating graphx?

Answer»

GRAPHX is a part of Spark framework, which use for graph and graph based PARALLEL processing, GraphX extends the Spark RDD by introducing a NEW Graph abstraction: a directed multigraph with properties attached to each vertex and edge. To support graph computation.

  • Extension on Spark RDD to perform computation on graph DB.
  • Follow directed multigraph data structure.
  • Support operator like (joinGraph, joinVertices and mapReduceTriplet)
  • Support both Supervised and unsupervised algorithms. 
  • GraphX optimizes the representation of vertex and edge types when they are primitive data types (e.g., int, double, etc…) reducing the in-memory footprint by storing them in specialized arrays.
  • GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregate Messages)
  • Package need to import is “import org.apache.spark.graphx._”
  • val vertexRDD: RDD[(Long, (STRING, Int))] = sc.parallelize(vertexArray)
  • val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)
  • val graph: Graph[(String, Int), Int] = Graph(vertexRDD, edgeRDD)

GraphX in spark are immutable, distributed, and fault-tolerant. Changes to the values or structure of the graph are accomplished by producing a new graph with the desired changes.



Discussion

No Comment Found