CnC: CnC C++ Tutorial

The CnC programming model is quite different from most other parallel programming models in several important ways. Hence, we highly recommend you carefully read the following introduction sections (CnC in a Nutshell) carefully before diving into the hands-on tutorial. It'll take only a few minutes but you'll have a much easier time, more fun and faster success! Promise, it'll pay off!

CnC in a Nutshell

CnC is a new programming model which is different from most others. It designed for creating parallel applications, but not for expressing parallelism explicitly. In CnC the programmer declaratively specifies the dependencies among computation units, but does not in any way indicate how those are to be met. It is specifically designed for addressing the coordination among potentially parallel computation units and data.

In essence, the programmer declares certain dependencies between two (or more) computation units. Specifying the dependencies is simpler than expressing parallelism, because it only makes application semantics explicit - and does not depend on the platform or any specific parallelization technique. Additionally, it exposes more parallelism potential because it does not bind a particular parallelism to the algorithm/program. The CnC runtime will do the hard work of figuring out how to execute things in parallel; it will try to maximize parallelism, limited only by the ordering constraints defined by the programmer.

The only 2 ordering constraints which semantically exist in any program come from the following 2 relationships

producer/consumer
controller/controlee

It is apparent that a producer needs to executed before the consumer can run. Similarly, if one computation unit decides whether another computation needs to be executed or not, the decision maker (e.g. the controller) needs to go before the controlled computation(s).

These two relationships (producer/consumer and controller/controlee) are the only relations needed to determine semantically correct parallel or sequential execution orders. This information is known to any programmer, even when writing sequential programs. It only gets lost because traditional programming languages have no means to explicitly express it. Exactly this is what a CnC program specifies (on top of the computation functionality of course).

Burgers with Fries and Pies for Dessert

Burgers and fries are parallel

Let's make burgers and French Fries. Let's say the ready-to-process ingredients (cut potatoes and prepared meat) get delivered to a service hatch, from which the cook can take them for processing. In CnC such hatches are item-collections, which store input and output data/items. Each collection can hold multiple instances of the same data(-type). In our kitchen that would mean we have a hatch for potatoes and a another hatch for the meat. Similarly, whatever we create and cook (our output) would be placed in such a hatch (item-collection). Our hungry guests can pick up the delicacies from this hatch once they have been produced.

The tasks for our cooks are frying the potatoes and barbecuing the meat. In CnC we call such tasks steps, which are basically just normal functions. The interesting question is: are there any dependencies between the two cooking tasks/steps? Of course not: neither uses the result of the other and neither decides whether the other needs to be done (or not). Hence, there is no ordering required, we can fry the potatoes first, or do the meat first or do them in parallel.

Note:: If we insisted on working one-handed and allowed only one cook active at the same time, we needed to decide on a (arbitrary) sequence. In serial languages, the programmer is required to make that arbitrary choice but doesn't indicate that the choice was optional.

Can't bake a pie until it's prepared

Let's add mini cherry pies to our meal for dessert. A mini-pie has to be prepared and baked (prepare_pie, bake_pie). Altogether, we now have four steps, three of which (barbecue_burger, fry_potatoes, prepare_pie) have no dependencies between them. However, we need the pie in order to bake it. We don't need to know where it comes from, we only need the ready assembled pie: we take it from the hatch once it's there and bake it. Similarly, for the task of preparing a pie we don't need to know what's going to happen with it afterwards, we just create it and put it on the hatch.

Incidentally, there is a producer/consumer relationship between prepare_pie and bake_pie. A producer/consumer relationship constitutes one of the two before-mentioned ordering constraints: the producer needs to be executed before the consumer can use the produced item. Note, the programmer does not explicitly indicate a specific execution order. The runtime takes care of it: as it knows about the relationship, it will never bake a pie before it has been prepared.

Waiter controlling cooks

Now, let's say our guest are individuals and not all of them want the same menu. Some of them might want the full menu: a burger, fries and a pie. Someone else might prefer 2 burgers, no fries but a pie, and so on. So let's take orders (take_orders) first. Only if they know the orders, our cooks can actually start cooking. When taking orders, we must assign a unique identifier (a tag) to each order so that we later know which burgers, fries and pies are for which guest. To communicate the orders with the kitchen, our CnC kitchen uses special bowls, one for each step (barbecue_burger, fry_potatoes, bake_pie and prepare_pie). Whenever a step needs to be executed for a given order, the waiter simply puts the corresponding guest-tag into the step's bowl. For example, if guestA wants a burger and Fries, we put the tag/token 'A' into barbecue_burger's bowl and into fry_potatoes' bowl. Our cooks will then prepare a burger and fries for guest 'A', but as there is no token/tag in the pie bowl, they will not make a pie for 'A'. CnC calls such bowls tag-collections.

We have just declared a controller/controlee relationship. Our waiter decides which steps need to be executed. Moreover, our CnC kitchen assigns a tag (a unique identifier) to each execution instance. With this our chef (e.g. the CnC runtime) can now coordinate the different tasks as they come in (e.g. to make sure burgers and fries from the same order are ready together).

We have a CnC specification!

We have now introduced the dependencies which imply a certain (partial) ordering. This partial ordering allows us (e.g. the CnC runtime) to determine a legal scheduling of the step instances (computation units), be it serial or parallel. If we have ten cooks or one, all they need to know are the ordering constraints (besides knowing how to cook, of course). They could do everything in parallel, except they need to 1. wait for the tags to come in before starting a task 2. wait for each pie to be prepared before baking it.

Note:: The constraints we defined are semantic attributes, they are requirements only of the algorithm. They exist independent of the programming language or programming tool, e.g. even if not using CnC they must be obeyed: a pie simply needs to be prepared before put into the oven. And it doesn't make sense to start frying potatoes unless we know that someone actually wants fries.

Basically we came up with a CnC specification for our kitchen problem. What we have done is

identified the computation units (barbecue_burger, fry_potatoes, prepare_pie, bake_pie and take_orders)
identified the data entities (meat, potatoes, fries, burgers, pies)
identified the control tags (guest-id)
consumer/producer relationships (prepare_pie -> pies -> bake_pie)
controller/controlee relationships (take_orders => xxx_bowls => steps)

That's what makes a CnC specification; paired with an implementation of each computation step it makes an application ready for parallel execution. Nothing else is needed in the CnC world - the rest is handled by the CnC runtime. The CnC approach leaves the CnC runtime with all freedom to execute things in parallel as long as it satisfies the defined semantics.

But...

..of course there are rules when programming in CnC:

Computation units (steps) must execute statelessly, e.g. computation must not access or even alter any global data and leave no traces behind other than putting things into CnC's collections.
Data is immutable. Once put, data-items cannot be altered. instead of changing a value, in CnC you put a new value with a new tag.

The rules might sound more restricting than they actually are. Actually, in this tutorial you will almost certainly start appreciating the rules. They are much easier to follow than the common rules of traditional parallel frameworks. For example, almost all threading frameworks tell the programmer "Don't create races". That's a very vague rule and verifying it's correct implementation is close to impossible. CnC's rules are easy to follow, easy to check and they will actually give race-freedom and determinism without you needing to think about it.

The CnC Recipe

The challenge in using CnC certainly lies in applying its concepts to a given problem (application). Luckily, once that's accomplished, the remaining tasks are straight forward and later changes to the design are relatively simple to make. The thought process to getting to a complete CnC design for your application can happen in several small phases:

Define the data and computation entities of your application (step-, item- and tag-collections)
Define how to distinguish between instances of these entities (what do the identifiers/tags look like?)
Define the relations between the entities (producer/consumer and controller/controlee)

This tutorial will guide you through these phases with a simple example program. It'll explain the use of CnC with the C++ API by creating a program to compute Fibonacci numbers. Due to nature of Fibonacci CnC doesn't necessarily provide the most natural expressiveness for it. However, Fibonacci has the right complexity and characteristics to demonstrate CnC and Intel's C++ API without getting lost in details of non-CnC related issues.

After creating a first program, we will introduce a debugging-interface and more advanced features like a tuning interface providing powerful tuning knobs.

I am ready, let's go!