Yes. However, DUP internally always only sets up stages using UNIX pipes (same dupd parent process) or TCP. In order to get support for another two-party protocol, you should write two small wrapper stages, one client-stage and one server-stage. Those stages should use the uni-directional (!) TCP stream to transmit the server's address to the client (the server stage must hence be run on the sending-end of DUP-created stream). The client stage should then connect to the server. Then both stages should close the original TCP stream created by DUP, use dup2 to take over the original file descriptor and then exec the stage that desires a non-TCP protocol.
With such pairs of stages, you can easily switch to say RTP by wrapping the original process. In DUP, the changes would look like this. Before:
l1@h1[...,1|l2:0] $ data-source;
l2@h2[...] $ data-target;
After:
l1@h1[...,1|l2:0] $ rtp-wrapper-server 1 data-source;
l2@h2[...] $ rtp-wrapper-client 0 data-target;
In this example, the extra arguments "0" and "1" specify the TCP stream to replace by an RTP stream.
The DUP developers plan to provide pairs of such stages for various transport protocols in the near future.
Note that the label at the beginning of each DUP instruction does not have to correspond to the name of the process at all. Also, file-descriptors don't have to be unique in the system. If you want to do the equivalent of
cat - | grep Hello | tr wW | grep World
in DUP, you could use:
g1@localhost[DUP:0|0,1|tr:0] $ grep Hello ;
tr@localhost[1|g2:0] $ tr wW ;
g2@localhost[1|DUP:1] $ grep World ;
The same also applies to stages with multiple input and output streams.
Lookup uses 5 streams, 2 input streams and 3 output streams, which can be confusing. We should first describe what lookup does. Lookup first builds a dictionary from a long list of strings (read from FD 3). Once there are no more inputs on FD 3, lookup starts to read from FD 0 (standard input). It then performs a lookup in the dictionary to see if the given string is present. If so, the string is written to FD 1 (stdout). Strings that do not match are written to FD 4. Eventually there will be no more input on FD 0 (end of stdin). At this point, lookup writes all of the entries in the dictionary that were never matched against to FD 5.
It should be noted that any subset of the FDs 1, 4 and 5 can be closed, in which case the corresponding outputs are simply not generated. Here is a simple example for using lookup directly in bash:
echo -e "a\ne\ni\no\nu" > dict.txt
echo -e "a\nb\ne\ne\nf\n" > input.txt
lookup 3< dict.txt aee.txt 4> bf.txt 5> iou.txt
The equivalent use in DUP would be:
look@localhost[0aee.txt,3bf.txt,5>iou.txt] $ lookup ;
In general, file descriptors have no specific meaning to DUP itself. However, each stage is free to say that particular file descriptors have a particular meaning. For example, fanout always expects to read from stdin and always reports errors to stderr. All other file descriptors (including stdout, but stdout does not have to be used) are treated equally by fanout.
In general, we recommend that stages should read their most important input from stdin (0), write their most important output to stdout (1) and report errors to stderr (2). However, DUP will work fine if you write errors to FD 0, read from FD 1 and report errors to FD 42 and FD 43. This would be a great code obfuscation technique.
The most useful general-purpose DUP stages are:
All of these stages have the option to run in line-mode or in record-mode (block-oriented). For details about how errors are managed the man pages should be consulted.
Finally, the dup command itself can be used as a stage, which can be interesting when a component written in DUP is to be used within a larger DUP application. In this case, it is often convenient to invoke dup with the -c option in order to keep standard input and standard output available to be used for input and output streams.
This answer presents the DUP code for the simplest possible case. The simplest case would be if the work being processed is specified in an input file where each line represents one unit of work that needs to be processed.
Furthermore, we assume that the stage that does the actual work is written as a simple program that reads the work from standard input and writes the results to standard output. In other words, running the program sequentially can be done using:
$ my_prog < input.txt > output.txt
In this case, we can distribute the input over a number of hosts using DUP's deal stage. This will distribute the input in a round-robin fashion over all of the processing stages. Afterwards, we need to collect the output. If the order of the output does not matter, we can use faninany. If the order needs to be preserved, the gather stage can be used which reads inputs in round-robin fashion. Both faninany and gather operate one line at a time, so the main stage must be written to produce exactly one line of output for one line of input.
Given all of these constraints, the necessary DUP code for four hosts would look like this:
dist@localhost[DUP:0|0.1|p1:0,3|p2:0,4|p3:0,5|p4] $ deal ;
p1 @host1[1|merg:0] $ my_prog ;
p2 @host2[1|merg:3] $ my_prog ;
p3 @host3[1|merg:4] $ my_prog ;
p4 @host4[1|merg:5] $ my_prog ;
merg@localhost[1|DUP:1] $ gather ;
Note the use of the special label DUP to refer to the dup command itself. If the above code is stored in ep.dup, the distributed version can be invoked using
$ dup -c ep.dup < input.txt > output.txt
This will of course only work if dupd processes are running on localhost and the four hosts.
All current components of the DUP System are released under the GNU Public License (GPL).
Note that this license only applies to the DUP code and the filters shipped with DUP. It is legal for you to write your own applications with the DUP System and to use DUP filters as part of the resulting application. Since all of the DUP filters as well as dup and dupd would be run as seperate processes, it is my (and, as far as I know, the FSF's) interpretation of the GPL that DUP and the DUP filters being GPLed imposes no requirements on the licenses for your application or the other filters it may be using. However, if you modify or distribute the C/C++ code of the DUP implementation itself, you will have to redistribute those modifications under the GPL.
After compiling and installing the DUP System, you must first start dupd. You do not need to specify any command-line options at this time. Then, write a simple DUP application, for example like this:
hello@localhost[1|DUP:1]$ echo "Hello World";
Store this line in a file called hello.dup. Running hello.dup will direct the standard output (1) of the echo command to the standard output of the dup command. In order to invoke the command, run:
$ dup < hello.dup
If it works, you should get back the output "Hello World".
Yes.
Since version 0.1.0 the DUP System uses ssh for authentication and to encrypt session information exchanged between nodes. As a result, we believe that it is generally safe for users to run DUP. However, DUP does not encrypt the streams of data exchanged between computational stages (since this would be computationally expensive and likely destroy possible performance gains from distributing the computation). If you exchange data between nodes on the Internet using DUP, you should add openssl stages into the DUP flow graph to encrypt data streams at the appropriate locations.
No, it is easily possible to run a DUP application just on a single operating system. However, even in this case you currently still have to start the dupd background process.
While virtually all DUP applications will run in parallel it is not necessary that this is the reason for using DUP. DUP may be useful whenever you are writing an application that can be easily broken into components that exchange information using streams.
The DUP System was named after the dup system call on BSD and POSIX operating systems.
Recent comments
9 min 3 sec ago
11 hours 47 min ago
22 hours 30 min ago
22 hours 30 min ago
22 hours 31 min ago
22 hours 33 min ago
1 day 9 hours ago
1 day 9 hours ago
1 day 9 hours ago
1 day 9 hours ago