Maiter: Asynchronous Graph Processing

What is Maiter?

The Maiter project started at University of Massachusetts Amherst in 2012. Maiter is a message-passing framework that supportsĀ delta-based accumulative iterative computations(DAIC). The DAIC computation model is a new concept of iterative computation that supports asynchronous execution, which is very important for large-scale distributed iterative computation.

Based on DAIC:

* We can process vertex state updates in delta manner, which means only state changes are considered for computation. The non-change vertex (delta is zero) is skipped for processing.

* We can build asynchronous dataflow model. Every worker is processing a data partition totally independent from each other worker. There is no synchronization barrier in the traditional iterative computation. Fast workers perform more computations, while slow workers perform less computations.

* We can perform flexible scheduling policies under the asynchronous processing engine, including round robin scheduling, priority scheduling, filter scheduling, and so on.

Maiter is a C++ framework, which runs on hundreds of distributed commodity PCs or in the cloud. It is implemented by modifying Piccolo. Maiter's idea comes fromĀ PrIter, which supports prioritized iteration. Maiter is

* more general, providing general computation model and flexible scheduling policies including round robin, priority, and filtering.

* more efficient, providing asynchronous updates with efficient computations by using the most recent results

* more scalable, holding up to 2 billion nodes for PageRank on Amazon EC2 Cloud with 100 instances

* more faster, 2x-10x faster than PrIter and Piccolo

Why Maiter?

Myriad of machine learning and data mining algorithms require parsing data sets iteratively. The distributed computing frameworks proposed so far either do not support iterative computation or support iterative computation assuming that synchronizations have to be performed. Eliminating synchronization barriers in iterative computation can significantly improve efficiency, especially under a large-scale heterogeneous environment. In addition, asynchronous iteration allows to use the most recent result to perform iterative updates, which helps improve computation efficiency.

Maiter is such a framework that supports asynchronous delta-based accumulative iterative computation. Delta-based Accumulative Iterative Computation is a new technique to perform iterative computations with more efficient computation and avoiding synchronization barriers.

How is Maiter?

The experimental results performed on 100-node EC2 cloud show

http://fmn.rrimg.com/fmn063/20140618/1105/original_k2gL_7cfa0000026f125f.jpg

http://fmn.rrfmn.com/fmn061/20140618/1110/original_OHrS_7baa000002b3125f.jpg

How to use Maiter?

The software stack of Maiter

http://fmn.rrfmn.com/fmn060/20140618/1600/original_Zhgs_62e7000009ea1191.jpg



For any usage problem, please contact with us:

* Yanfeng Zhang (zhangyf@cc.neu.edu.cn)


Current Work

  1. Looking for more DAIC algorithms (HADI algorithm, Affinity Propagation algorithm, ...)
  2. Automatic tranlation engine that can translate MapReduce/GraphLab code to Maiter code
  3. Partition schemes that helps asynchronous processing
  4. Maiter with hybrid storage support (DRAM/SSD/HDD)
  5. Incremental Computation

http://fmn.rrimg.com/fmn066/20140618/1615/original_pTMj_128400000a671190.gif