How would I write a simple embarrassingly parallel application in DUP?

This answer presents the DUP code for the simplest possible case. The simplest case would be if the work being processed is specified in an input file where each line represents one unit of work that needs to be processed.

Furthermore, we assume that the stage that does the actual work is written as a simple program that reads the work from standard input and writes the results to standard output. In other words, running the program sequentially can be done using:

$ my_prog < input.txt > output.txt

In this case, we can distribute the input over a number of hosts using DUP's deal stage. This will distribute the input in a round-robin fashion over all of the processing stages. Afterwards, we need to collect the output. If the order of the output does not matter, we can use faninany. If the order needs to be preserved, the gather stage can be used which reads inputs in round-robin fashion. Both faninany and gather operate one line at a time, so the main stage must be written to produce exactly one line of output for one line of input.

Given all of these constraints, the necessary DUP code for four hosts would look like this:

dist@localhost[DUP:0|0.1|p1:0,3|p2:0,4|p3:0,5|p4] $ deal ;
p1 @host1[1|merg:0] $ my_prog ;
p2 @host2[1|merg:3] $ my_prog ;
p3 @host3[1|merg:4] $ my_prog ;
p4 @host4[1|merg:5] $ my_prog ;
merg@localhost[1|DUP:1] $ gather ;

Note the use of the special label DUP to refer to the dup command itself. If the above code is stored in ep.dup, the distributed version can be invoked using

$ dup -c ep.dup < input.txt > output.txt

This will of course only work if dupd processes are running on localhost and the four hosts.

Recent comments

Syndicate

Syndicate content