This answer presents the DUP code for the simplest possible case. The simplest case would be if the work being processed is specified in an input file where each line represents one unit of work that needs to be processed.
Furthermore, we assume that the stage that does the actual work is written as a simple program that reads the work from standard input and writes the results to standard output. In other words, running the program sequentially can be done using:
$ my_prog < input.txt > output.txt
In this case, we can distribute the input over a number of hosts using DUP's deal stage. This will distribute the input in a round-robin fashion over all of the processing stages. Afterwards, we need to collect the output. If the order of the output does not matter, we can use faninany. If the order needs to be preserved, the gather stage can be used which reads inputs in round-robin fashion. Both faninany and gather operate one line at a time, so the main stage must be written to produce exactly one line of output for one line of input.
Given all of these constraints, the necessary DUP code for four hosts would look like this:
dist@localhost[DUP:0|0.1|p1:0,3|p2:0,4|p3:0,5|p4] $ deal ;
p1 @host1[1|merg:0] $ my_prog ;
p2 @host2[1|merg:3] $ my_prog ;
p3 @host3[1|merg:4] $ my_prog ;
p4 @host4[1|merg:5] $ my_prog ;
merg@localhost[1|DUP:1] $ gather ;
Note the use of the special label DUP to refer to the dup command itself. If the above code is stored in ep.dup, the distributed version can be invoked using
$ dup -c ep.dup < input.txt > output.txt
This will of course only work if dupd processes are running on localhost and the four hosts.
Recent comments
9 min 3 sec ago
11 hours 47 min ago
22 hours 30 min ago
22 hours 30 min ago
22 hours 31 min ago
22 hours 33 min ago
1 day 9 hours ago
1 day 9 hours ago
1 day 9 hours ago
1 day 9 hours ago