Discussion questions for the Distributed Graphlab paper.

1. Learn about Bulk Synchronous Parallel (BSP) model. Consider a graph with four nodes connected in a ring A-B-C-D-A. In the initial state, the nodes have values 1,0,1,0 respectively. The update function for each node sets the new state of the node to the average of its two neighbors. For example, the new state of B will be the average of the states of A and C.

a. What is the execution under BSP? Does it converge?

b. Show two possible converging executions under GraphLab’s asynchronous execution model.

c. Is it possible for the asynchronous execution to not converge (assuming infinite precision arithmetic)? Why?

2. What are the advantages of the asynchronous programming model over BSP? What are the disadvantages?

3. Understand GraphLab’s different consistency models. The paper claims that given a vertex coloring of the graph, GraphLab can satisfy the edge-consistency model by synchronously executing all vertices of the same color.

a. Is there a bug in this claim?

b. Does the strategy above satisfy the full-consistency model?

c. Does the strategy above satisfy the vertex-consistency model?

4. Equation (3) in the paper is a useful equation to remember. (You are likely to use it in the future.) What is the optimal interval for a cluster of 1000 machines, a per machine MTBF of 1 year, and a checkpoint of 2 minutes?

5. (Optional) If you don’t mind some calculus, understand Young’s 2-page paper.

6. (Optional) I strongly recommend understanding the Chandy-Lamport algorithm for distributed snapshots. (https://www.youtube.com/watch?v=RQquDTYkHKY. The example starts at 6:24.) It is one of the most beautiful algorithms you will encounter.