# Cilk

Paradigm imperative (procedural), structured, parallel MIT Laboratory for Computer Science Intel static, weak, manifest {{#property:P856}} Cilk++, Cilk Plus OpenMP 3.0[1]

Cilk is a programming language specifically designed for multi-threaded applications and parallel computing.

Originally developed in the 1990s at the Massachusetts Institute of Technology, the language is based on ANSI C, with the addition of a small amount of keywords. Later, the language has been extended to C++ as the Cilk++ dialect — a commercial product developed by Cilk Arts. In 2009 the company became part of Intel corporation which continued the development of the language.

Today the most common dialect is Intel Cilk Plus.

## Parallelism levels

• Task parallelism. Often parallelism at this level is most simple and yet efficient. It is possible in cases when the task naturally consists of a number of independent subtasks that can be solved separately in parallel.
• Data parallelism. It is possible in cases when a single opeartion is applied to a number of data elements. This type of parallelism widely occurs in various numerical simulation tasks.
• Algorithm parallelism. It includes algorithms for parallel sorting, matrix multiplication, linear equations systems solving. The task of implementing a parallel algorithm is complex, so there are quite a number of parallel algorithms libraries for different languages that allow to build program without going into the details of parallel algorithm implementation.
• Instruction parallelism. It is the lowest level of parallelism, implemented at the level of parallel execution of several processor instruction. On this level is batch processing of multiple data items with a single processor instruction (MMX, SSE, SSE2, etc.).

The Cilk language supports task parallelism and data parallelism. Thus, the most efficiently parallelized are "divide and conquer" algorithms as well as operations with recursive data structures such as lists and trees.

## Parallelism in Cilk

The main idea of the language is that the programmer should concentrate on structuring the program to "expose" parallelism. The runtime Cilk system takes the responsibility of scheduling the computation to run efficiently on a given platform. Thus, the Cilk runtime system takes care of details like load balancing, paging, and communication protocols. Due to this, Cilk programs can work on systems with different processors count, including single-processor ones. Runtime systen guarantees efficiency and precitable performance.

A distinctive feature of Cilk is that the expansions (new Cilk keywords that are absent in pure C) change the program slightly - if they are removed from the source code, you get the correct C program called serial elision (or C elision) of the Cilk program.

Cilk uses three new keywords in order to organize parallel execution of tasks:

• cilk - marks Cilk procedures (parallel versions of C functions);
• spawn - creates new task by calling Cilk procedure that can be started parallel;
• sync - waits for all spawned tasks to finish and then continues exectuion.

It is important to mention that using keyword spawn the programmer creates and launches a task, instead of writing the code that should be executed in a separate thread. This allows the Cilk task manager to rebalance execution flow - if one of the two threads in existing thread pool is heavily loaded while the second one is idle, the manager can redistribute the load and give part of the fist thread's tasks to the second one.

## Cilk program example

Consider the example program for calculating the value of the N-th Fibonacci Number.

cilk int fib(int n)
{
if (n < 2)
return n;
else
{
int x, y;
x = spawn fib(n-1);
y = spawn fib(n-2);
sync;
return (x+y);
}
}

cilk int main(int argc, char *argv[])
{
int n, result;
n = atoi( argv[1] );
result = spawn fib(n);
sync;
printf("Result: %d\n", result);
return 0;
}

Most part of Cilk procedures are executed sequentially as in C. Parallelism is achieved by using keyword spawn at Cilk procedure call. spawn can be considered a parallel analogue of the standard C function call. Unlike a C function call, however, where the parent is not resumed until its child finishes execution, in the case of a Cilk spawned procedure, the parent can continue to execute in parallel with the child.

A Cilk procedure cannot safely use the return values of the children it has spawned until it executes sync. If at the time of execution at least one child task is not finished, the parent procedure suspends until all tasks are finished. After that, the procedure continues.

sync serves as a local barrier, not global one as in other message-based parallel systems. In Cilk sync awaits for the completion of only those tasks that have been created in the current procedure, but not all running at the moment. In the example above, we need to execute sync before return x+y, since othervise the values of x and y may be not yet calculaed.

Thus, when working with Cilk the programmer uses keywords spawn and sync to expose parallelism in the program, while Cilk runtime system manages its effective implementation. Programs written in Cilk differs minimally from those written in pure C. If the Cilk keywords are removed, the result is a sequential version of a parallel program.

During compilation Cilk implicitly adds the keyword sync before each return. It guarantees that a Cilk procedure will not finish until all tasks spawned by it are finished. The main procedure must be named main, as in C. Unlike C, however, Cilk insists that the return type of main must be int.

It should be also noted that no assignment conversion is done for the assignment operator in a spawn statement, and therefore code such as next would not work:

cilk float foo(void);
cilk void bar(void)
{
int x;
x = spawn foo();
sync;
}

Instead, one must first assign the result to a variable of type float and then use a regular C assignment to convert this variable to an integer.

## Cilk and C interaction

Despite the fact that during compilation Cilk programs are translated to C code, there are some restrictions concerning how the two languages can communicate with each other. Thus, a Cilk procedure may call functions written in C, but C functions can not, in turn, refer to Cilk procedure or spawn tasks. C functions, however, may call special automatically generated stubs that will spawn tasks.

For example, suppose that C function f needs to invoke a Cilk procedure p that accepts and returns a double. For this f first creates a CilkContext object by calling a Cilk-run-time-library function Cilk_init. It then invokes the automatically-generated stub for p, which is called EXPORT(p). After it returns the context object may be released by calling Cilk_terminate. To use this functions one shoud include the cilk.h header file. Creating a Cilk context causes operating-system resources (namely threads) to be allocated, so this operation can incur considerable overhead. Therefore, reusing cilk context is more efficient than creating and terminating a context for each stub invocation.

Cilk_init has two arguments: int *argc, char *argv[] (which are similar to the arguments to main except that argc is passed by reference). Thus, f can control the behavior of the Cilk runtime system in exactly the same way that command-line arguments control the behavior of Cilk in a program whose main is a Cilk procedure.

The following is the source code of the described program.

#include <cilk.h>
cilk double p(double x)
{
return x * 0.1
}
void f()
{
double y;
CilkContext* context;    context = Cilk_init(&amp;argc, argv);
y = EXPORT(p)(context, 3.14);
Cilk_terminate(context);
}
cilk int main (int argc, char *argv[])
{
f();
return 0;
}

## Using shared memory and locks

Cilk allows the procedures to work with shared memory simply by accessing global variables or by passing pointers to procedures:

cilk void bar(int *px)
{
*px = *px+1
printf("%d", *px);
return;
}
cilk int foo(void)
{
int x = 0;
printf("%d", x)
spawn bar(&amp;x);
sync;
printf("%d", x)
return (y);
}

However, when several procedures work with a single memory area, data races can occur that lead to nondeterministic behavior of the program. Cilk provides three ways to solve this problem.

First way is to use a function Cilk_fence(). It guarantees that all memory operations of a processor are committed before the next instruction is executed. However it is not always useful.

The second way is to use locks, which allows to create atomic blocks of code. For this one should use functions Cilk_lock (tests a lock and blocks if it is already acquired) and Cilk_unlock (releases lock). These functions work with variables of type Cilk_lockvar. Such variables should be initialized via Cilk_lock_init:

#include <cilk-lib.h>
...
Cilk_lockvar mylock;
Cilk_lock_init(mylock);
...
{
Cilk_lock(mylock); /* begin critical section */
/* do something */
Cilk_unlock(mylock); /* end critical section */
}

The third way is to use keyword inlet. It allows to define a functinon inside a Cilk procedure. Such function will be capable of atomic execution and changing local variables of this procedure. The body of inlet function may contain operator abort, execution of which signals that previously started tasks can be finished prematurely. The next examples illustrates a function that recursively calculates product of elements in the array, and finishes its execution if at some point the product is equal to zero.

cilk int product(int *array, int size)
{
int prod = 1;
inlet void mult(int x)
{
prod *= x;
if (prod == 0)
abort;
}    if (size == 1)
return array[0];    int s2 = size/2;
mult( spawn product(array, s2) );
if ( prod == 0 )
return 0;
mult( spawn product(a+s2, size-s2) );
sync;
return prod;
}

## Intel Cilk Plus

Intel Cilk Plus is a modern dialect of Cilk, deveoped by Intel and based on the Cilk++ dialect, which, in turn, is an extension of Cilk on C++. The main differences of Cilk Plus from pure Cilk are as follows:

• the cilk, inlet, abort keywords are removed; keywords spawn и sync are replaced with cilk_spawn and cilk_sync;
• in addition to functions, cilk_spawn may also work with object methods, functors (objects with operator () overloaded) and lambda-functions (С++11);
• parallel execution of loops is now possible by using keyword cilk_for;
• it is possible to specify the maximum number of loop iterations that can be executed by a single Cilk strand via #pragma cilk grainsize = count;
• array notation is added to use vector instructions when working with arrays (SIMD processing); fragments of arrays can be used via массив[начало:длина:шаг];
• it is possible to use thread-safe variables based on templates: reducers. The standard library have types that support arithmetical operations, min and max, working with lists, etc. User-defined types are allowed on base of cilk::monoid_base and cilk::reducer.
#include <cilk/cilk.h>

unsigned int compute(unsigned int i)
{
return i;
}

int main(int argc, char* argv[])
{
unsigned long long int n = 1000000;
cilk::reducer_opadd<unsigned long long int> total(0);    cilk_for (unsigned int i = 1; i <= n; ++i)
total += compute(i);    std::cout << "Total " << total.get_value() << std::endl;
return 0;
}

## References

• Template:Cite conference