jPregel

Java-based Pregel easily deployed on any cloud

Welcome to jPregel!

US airline route graph

jPregel is a version of Pregel written in Java. Applications that use it to solve graph problems can be written in either Java or Scala.

jPregel enables deployment of Pregel applications in a manner that is independent of cloud service provider: It can deploy the distributed graph computation on any cloud service provider supported by jclouds, including such major cloud service providers as AWS, Rackspace, vCloud, Microsoft, HP, Ninefold, OpenStack, and Synaptic. By leveraging jclouds and Guice (for dependency injection), the jPregel deployment set enlarges as the set of cloud service providers supported by jclouds enlarges. Since jPregel uses Maven for dependency management, upgrading with jclouds is automated. Our use of Maven also increases efficiency and ease of deployment.

In addition, jPregel run times are good enough to be suitable for serious large-scale applications. We achieve a high degree of concurrency, first, by declaring war on Java synchronized methods: We systematically avoid synchronized methods, providing more fine-grained synchronization on critical data structures. We similarly declared war on the Java new keyword, using techniques that dramatically reduce garbage generation and improve garbage collection. Through these any myriad incremental performance improvements (e.g., graph construction, of course, is distributed among workers but also can be concurrent within a worker), jPregel completes a standard benchmark, the single source shortest path problem (SSSP) on binary trees with 250 million nodes, using 8 large EC2 instances as Workers, in 46 seconds. While not the fastest implementation for this benchmark problem, we believe jPregel’s performance and scalability are respectable. At spot prices, this run costs slightly more than two dollars: $2.16, with approximately 93% of the hour unused: 12 cents/run.

We simultaneously improved jPregel’s object-oriented design. This perhaps is best reflected by reducing code size by over 40%. Coupling and dependence is minimized between system/application classes and job-specific attributes. Such attributes include the graph problem to be solved, the graph constructors (e.g., uniform binary tree, 2-D Euclidean), and graph output methods, where each such attribute, as far as possible, can be injected independently.

A jPregel client runs on the application developer’s local machine (e.g., by clicking the run button in one’s integrated development environment, such as Netbeans or Eclipse), but simply and efficiently deploys the computation onto a cloud.

This project is licensed under the MIT license, which is provided in license.txt.