Developing jPregel Applications

A tutorial introduction

Developing Applications

merlin with student

Table of Contents

  1. Birds eye tour of a jPregel client
    1. What's in a Job?
    2. The Job directory structure
  2. Defining a Vertex class: Encapsulating a distributed graph algorithm
  3. The Vertex API
    1. A tutorial of Vertex methods goes here
  4. Making a Master Graph Maker
  5. Making a Worker Graph Maker
  6. Making a Master Output Maker
  7. Making a Worker Output Maker

Birds eye tour of a jPregel client

A jPregel client is a concise sequence of instructions illustrated below. It first specifies the Job object (lines 11 - 19). Then, it:

  1. (line 22) reserves and launches a jPregel cluster, specifying the number of worker machines as an argument, and obtaining a remote reference to the jpregel master.
  2. (line 23) runs the job
Following the listing, we delve into the Job object.

	 1 package clients;
	 2 import system.*;
	 3
	 4 public class SourcesLocalClient
	 5 {
	 6     /**
	 7      * @param args [0]: Job directory name
	 8      */
	 9     public static void main( String[] args ) throws Exception
	10     {
	11         Job job = new Job(
	12                 "Identify source nodes",  // jobName
	13                 args[0],                          // jobDirectoryName
	14                 new VertexSources(),     // vertexFactory
	15                 new MasterGraphMakerStandard(),
	16                 new WorkerGraphMakerStandard(),
	17                 new MasterOutputMakerStandard(),
	18                 new WorkerOutputMakerStandard()
	19                 );
	20         int numWorkers = 1;
	21         System.out.println( job + "\n    numWorkers: " + numWorkers );
	22         ClientToMaster master = LocalReservationService.newCluster( numWorkers );
	23         System.out.println( master.run( job ) );
	24         System.exit( 0 );
	25     }
	26 }
	

What's in a Job?

We briefly enumerate the parameters in the Job constructor:

  1. Job Name - A String describing the job/computation. In our example, we simply use the String "Identify source nodes" since that describes the example computation.
  2. Job Directory - A String indicating the job directory or job bucket, which is used for job input/output. In our example, this was passed in as the 1st command-line argument, args[0]. The example client specifies a local execution (as opposed to an AWS execution). For local executions, the job directory is specified as a path relative to our Netbeans project, which for us was "examples/Sources": In our Netbeans project jpregel-aws, we have a directory called examples in which there is a directory called Sources. For an AWS execution, we would specify an S3 bucket name. These will be explained later.
  3. Vertex Factory - A Vertex object whose compute method encapsulates the graph problem we would like to solve. In our example, we pass in a VertexSources object, since its compute method encapsulates the identify sources computation. Please see the API documentation for information about the vertex classes graph are included in the jpregel installation. Later in the tutorial, we discuss making your own vertex class.
  4. Master Graph Maker - A master graph maker, which reads an input file and creates 1 input file per worker. The workers then read their respective input file and construct a part of the graph on which the computation will be run. In our example, we instantiated a standard master graph maker, which reads an input file for a graph in a standard form. Please see the API documentation for information about the master graph makers that are included in the jpregel installation. Later in the tutorial, we discuss making your own master graph maker.
  5. Worker Graph Maker - A worker graph maker, which reads an input file, created by the master graph maker, and constructs vertices using the Vertex Factory specified above. In our example, we instantiated a standard worker graph maker, which reads a standard worker input file and constructs the indicated vertices of a basic graph. Please see the API documentation for information about the worker graph makers that are included in the jpregel installation. Later in the tutorial, we discuss making your own worker graph maker.
  6. Master Output Maker - A master output maker, which reads the output files produced by each of the workers, and produces an output file for the overall graph computation. In our example, we instantiated a standard master output maker, which reads a standard worker output file and constructs a standard graph output. Please see the API documentation for information about the master output makers that are included in the jpregel installation. Later in the tutorial, we discuss making your own master output maker.
  7. Worker Output Maker - A worker output maker, invokes the output method on each of its vertices and produces an output file that will be read by the master output maker. In our example, we instantiated a standard worker output maker, which produces a file with 1 line per vertex output 1 vertex output per line. Please see the API documentation for information about the worker output makers that are included in the jpregel installation. Later in the tutorial, we discuss making your own worker output maker.

The phases of a job are:

  1. Construct the graph
  2. Run the graph computation
  3. Collect the computation's output

Job directory structure

Before we discuss constructing the graph, we introduce the Job Directory structure, illustrated below. Each job uses this file/directory structure. Directories are represented with blue rectangles; files are red.

Job Directory sructure with n worker machines
Job Directory structure when there are n worker machines.

Constructing the graph

This construction process is as follows:

  1. The master graph maker reads the file named "input" in the job directory;
  2. The master graph maker creates n files in the job directory's in sub-directory, named 1, 2, ..., n, where there are n workers;
  3. The worker whose worker number is w reads the file named w in the in sub-directory and constructs the vertices indicated by that file.

Defining a Vertex class

Encapsulating a distributed graph algorithm

A Vertex class directly or indirectly extends VertexImpl, which implements the Vertex interface. To be continued ...