Lecture Note 6

Interconnection Networks: Array and Mesh

Interconnection network
An interconnection network connects the processors together, and transports data among them.
An interconnection network has its own topology. In this lecture we will focus on one dimensional array and multi-dimension networks.
An interconnection network can be viewed as a graph, in which every node is a processor, and every link is a communication channel.
The  topology of a network is essential to the communication performance.
Diameter
The diameter is the maximum distance between two nodes in the network. The distance is measured in number links from one node to the other.
The larger the diameter, the longer it will take to send message from one end to the other end of the network.
Bisection
The bisection of a network is the minimum number of links we need to cut in order to  divide the network in to two halves of roughly equal size.
The larger the bisection, the less often the messages will contend with one another while going from one half of the network to the other.
Array
One dimensional array is the "least connected" topology to connect processors together, and also the least expensive to build
"Compare and select" sorting
We pump the data from one end of the array one at a time.
The processor does the following after receiving a data.
If it has no data, then it keeps the incoming data.
Otherwise it compares the one it has with the incoming data, and selects the larger one to be sent to its neighbor.
Finally all the data will fall into the correct position.
All the figures, unless stated otherwise, are taken from "Introduction to Parallel Algorithms and Architectures: Arrays, Trees, hypercubes," by F. T. Leighton, Morgan Kaufmann, ISBN 1-55860-117-1
Analysis
Both time and the number of processors used are n (n is the number of data), so the cost is n2
Bubble sort
A simple algorithm to sort the data in an array.
Notice that the latter stage can be moved "earlier" to reduce the time.
Even-odd transportation sort
Even-odd transportation sort works in phases. In even phases, the even numbered processors compare and exchange data with its right-hand side neighbors. In odd phases the odd numbered processors do the same.
After n even phases and n odd phases (n is the number of data), the array is sorted.
Proof of correctness
We use 0-1 principle to prove the correctness of the algorithm.
Consider the rightmost 1. This 1 will move either in the first or the second phase, and once it is moving, it will not stop until it hits the end of the array.
Consider the second rightmost 1. The latest time step This 1 will start moving is third phase, and similarly, once it starts moving, it will not stop until it his the second-to-the-right cell. 
In general the ith rightmost 1 starts moving at time step i + 1, and will reach the n - i + 1 cell.
Everyone reaches his destination in n steps.
Analysis
Both time and the number of processors used are n (n is the number of data), so the cost is n2
Inner product
Two vectors are fed into both ends of the array for inner product.
The timing is everything.
Two dimensional mesh
Snake sort
The algorithm
The snake sort proceeds in phases. In the first phase each row is sorted so that even numbered rows have the largest number to the right, and the odd numbered rows have the largest number to the left.
In the second phase each column is sorted so that the smallest numbers appearing at the top of the columns.
After 2log n phases, where n is the number of rows, the data becomes sorted.
Proof of correctness
Again we use 0-1 principle.
Let's call a row full of 0's (or 1's) clean, otherwise the row is dirty.
Two consecutive dirty rows will produce at least one clean row since they have opposite orientation of 0's and 1's.
The number of dirty rows decreases by half after one row sort and one column sort.
There will be at most one dirty row after 2log n phases.
Analysis
The number of phases is 2log n, so the total time is O(n log n).
A better snake sort
The time of snake sort is not optimal.
The following program can sort the data in a mesh in optimal time.

  1. Divide the mesh into blocks of N3/8 by N3/8  and sort them with snake sort. 
    At most one row in each block is dirty.
  2. Distribute the columns among vertical slices.
  3. Sort each block in snake order. 
    At most two rows in a horizontal slice are dirty.
  4. Sort each column. 
    The number of zeros in each columns in the same vertical slice differ by at most N3/8.
  5. Sort the first and the second blocks, the third and fourth blocks, ...  of each vertical slice into snake order.
  6. Sort the second and the third blocks, the fourth and the fifth blocks, ...  of each vertical slice into snake order. 
    Each vertical slice contains at most one dirty row.
    At most two overall dirty rows since the numbers of 1's in the vertical slices differ by at most N1/4.  
  7. Sort each row.
  8. Perform 2N3/8 steps of even-odd transportation sort.
Analysis
Steps 1, 3, 5, and 6 -- O(N3/8log N) time steps.
Steps 4 and 7 -- 2N1/2 time steps.
Steps 2 -- N1/2 +O(N3/8) time steps.
Step 8 -- 2N3/8 time steps.
Total -- 3N1/2 + o(N1/2) . This matches the lower bound we will describe later..
A matching lower bound for sortinf
Here we want to establish a better lower bound than the obvious one  -- 2N1/2
2N1/2 unknown values are in the upper left-hand corner, and the numbers from 1 to N - 2N1/2 are stored elsewhere.
Let x be the element at the lower right-hand corner  at step 2 N1/2 - 2 N1/4 -3, and x is independent from those unknowns.
Assume that the unknown area has m 0's and 2N1/2 - m 1's. If m varies, x must go to different columns (i.e., it will be pushed by those 0's to the correct column).
We just m so that x must go to the first column and the result follows.
Routing on a two dimensional mesh
It is never a problem in routing messages for a array (using a naive greedy algorithm).
No contention occurs.
We use the following simple greedy algorithm to route messages in a mesh.
First all the messages are routed to the correct columns.
Then the messages go to the correct rows. The message going the longest distance is given the highest priority when contention occurs.
The proof is similar to the one in even-odd transportation sort.