CnC
|
In principle, every clean CnC program should be immediately applicable for distributed memory systems. With only a few trivial changes most CnC programs can be made distribution-ready. You will get a binary that runs on shared and distributed memory. Most of the mechanics of data distribution etc. is handled inside the runtime and the programmer does not need to bother about the gory details. Of course, there are a few minor changes needed to make a program distribution-ready, but once that's done, it will run on distributed CnC as well as on "normal" CnC (decided at runtime).
Conceptually, CnC allows data and computation distribution across any kind of network; currently CnC supports SOCKETS and MPI.
Support for distributed memory is part of the "normal" CnC distribution, e.g. it comes with the necessary communication libraries (cnc_socket, cnc_mpi). The communication library is loaded on demand at runtime, hence you do not need to link against extra libraries to create distribution-ready applications. Just link your binaries like a "traditional" CnC application (explained in the CnC User Guide, which can be found in the doc directory).
Even though it is not a separate package or module in the CNC kit, in the following we will refer to features that are specific for distributed memory with "distCnC".
As a distributed version of a CnC program needs to do things which are not required in a shared memory version, the extra code for distCnC is hidden from "normal" CnC headers. To include the features required for a distributed version you need to
#include <cnc/dist_cnc.h>
instead of
#include <cnc/cnc.h>
. If you want to be able to create optimized binaries for shared memory and distributed memory from the same source, you might consider protecting distCnC specifics like this:
#ifdef _DIST_ # include <cnc/dist_cnc.h> #else # include <cnc/cnc.h> #endif
In "main", initialize an object CnC::dist_cnc_init< list-of-contexts > before anything else; parameters should be all context-types that you would like to be distributed. Context-types not listed in here will stay local. You may mix local and distributed contexts, but in most cases only one context is needed/used anyway.
#ifdef _DIST_ CnC::dist_cnc_init< my_context_type_1 //, my_context_type_2, ... > _dinit; #endif
Even though the communication between process is entirely handled by the CnC runtime, C++ doesn't allow automatic marshaling/serialization of arbitrary data-types. Hence, if and only if your items and/or tags are non-standard data types, the compiler will notify you about the need for serialization/marshaling capability. If you are using standard data types only then marshaling will be handled by CnC automatically.
Marshaling doesn't involve sending messages or alike, it only specifies how an object/variable is packed/unpacked into/from a buffer. Marshaling of structs/classes without pointers or virtual functions can easily be enabled using
CNC_BITWISE_SERIALIZABLE( type);
others need a "serialize" method or function. The CnC kit comes with an convenient interface for this which is similar to BOOST serialization. It is very simple to use and requires only one function/method for packing and unpacking. See Serialization for more details.
This is it! Your CnC program will now run on distributed memory!
The above describes the default "single-program" approach for distribution. Please refer to to CnC::dist_cnc_init for more advanced modes which allow SPMD-style interaction as well as distributing parts of the CnC program over groups of processes.
The communication infrastructure used by distCnC is chosen at runtime. By default, the CnC runtime will run your application in shared memory mode. When starting up, the runtime will evaluate the environment variable "DIST_CNC". Currently it accepts the following values
Please see Using Intel(R) Trace Analyzer and Collector with CnC on how to profile distributed programs
On application start-up, when DIST_CNC=SOCKETS, CnC checks the environment variable "CNC_SOCKET_HOST". If it is set to a number, it will print a contact string and wait for the given number of clients to connect. Usually this means that clients need to be started "manually" as follows: set DIST_CNC=SOCKETS and "CNC_SOCKET_CLIENT" to the given contact string and launch the same executable on the desired machine.
If "CNC_SOCKET_HOST" is not a number it is interpreted as a name of a script. CnC executes the script twice: First with "-n" it expects the script to return the number of clients it will start. The second invocation is expected to launch the client processes.
There is a sample script "misc/start.sh" which you can use. Usually all you need is setting the number of clients and replacing "localhost" with the names of the machines you want the application(-clients) to be started on. It requires password-less login via ssh. It also gives some details of the start-up procedure. For windows, the script "start.bat" does the same, except that it will start the clients on the same machine without ssh or alike. Adjust the script to use your preferred remote login mechanism.
CnC comes with a communication layer based on MPI. You need the Intel(R) MPI runtime to use it. You can download a free version of the MPI runtime from http://software.intel.com/en-us/articles/intel-mpi-library/ (under "Resources"). A distCnC application is launched like any other MPI application with mpirun or mpiexec, but DIST_CNC must be set to MPI:
env DIST_CNC=MPI mpiexec -n 4 my_cnc_program
Alternatively, just run the app as usually (with DIST_CNC=MPI) and control the number (n) of additionally spawned processes with CNC_MPI_SPAWN=n. If host and client applications need to be different, set CNC_MPI_EXECUTABLE to the client-program name. Here's an example:
env DIST_CNC=MPI env CNC_MPI_SPAWN=3 env CNC_MPI_EXECUTABLE=cnc_client cnc_host
It starts your host executable "cnc_host" and then spawns 3 additional processes which all execute the client executable "cnc_client".
for CnC a MIC process is just another process where work can be computed on. So all you need to do is
Step instances are distributed across clients and the host. By default, they are distributed in a round-robin fashion. Note that every process can put tags (and so prescribe new step instances). The round-robin distribution decision is made locally on each process (not globally).
If the same tag is put multiple times, the default scheduling might execute the multiply prescribed steps on different processes and the preserveTags attribute of tag_collections will then not have the desired effect.
The default scheduling is intended primarily as a development aid. your CnC application will be distribution ready with only little effort. In some cases it might lead to good performance, in other cases a sensible distribution is needed to achieve good performance. See Tuning for distributed memory.