Monday, November 28, 2011

Compiling MPI.NET under Ubuntu Oneiric

I recently wanted to experiment with parallel programming and - since my language of choice is C# - thought it might be interesting to try out using Message Passing.  There are two main libraries available: The Message Passing API (MPAPI) and MPI.NET.  MPAPI provides an independent framework (that does not rely on the Message Passing Interface (MPI) or MPI-2 standards), while MPI.NET provides low-level bindings to the MPI-2 standard directly.

I have done some previous work with MPAPI and found it to be a capable framework that I would highly recommend.  However, that initial choice was because I had significant difficulty compiling MPI.NET under Ubuntu.  I've since resolve those difficulties, so today we will discuss building MPI.NET under Ubuntu (Oneiric).

Step 1: Download MPI.NET version 1.0.0 from the MPI.NET download page.  You want the mpi.net-1.0.0.tar.gz package (source code for Unix).

Step 2: Untar the files and enter the directory:
tar xzvf mpi.net-1.0.0.tar.gz
cd mpi.net-1.0.0
Step 3: Install build prerequisites
sudo apt-get install mono-complete libtool automake autoconf build-essential
Step 4: Install an MPI framework

This is subject to some choice.  Two good libraries are OpenMPI and MPICH2 which you can install through apt get via one of (do not execute both) of the following commands:
sudo apt-get install libopenmpi-dev openmpi-bin openmpi-doc
or
sudo apt-get install mpich2 libmpich2-dev
 Step 5: Create a symbolic link from /usr/bin/ilasm to /usr/bin/ilasm2.  For some reason that is not clear to me MPI.NET looks for ilasm2 which is not present (wrapped into ilasm?) in later versions of Mono.
ln -s /usr/bin/ilasm /usr/bin/ilasm2
Step 6: Download and install patches

Several source files need to be patched to allow successful compilation (see this helpful mailing list posting).  Download Makefile.am.patch, Unsafe.pl.patch, and configure.ac.patch from this repository.  Save them in the mpi.net-1.0.0 directory.

Apply the patches with the following three commands:
patch configure.ac < configure.ac.patch
patch MPI/Unsafe.pl < Unsafe.pl.patch
patch MPI/Makefile.am < Makefile.am.patch
These should all complete successfully.  If not, check the exact filename you used when you saved the patch.  Rename them (or modify the above three commands), as appropriate.

Step 7: Run the standard compilation sequence.  Everything should complete successfully:
sh autogen.sh
./configure
make
make install
The desired MPI.dll and MPI.dll.config is in the MPI subdirectory of the mpi.net-1.0.0.

Step 8: Try creating a hello world program to verify MPI.NET works.

In a new folder, create a source file (Hello.cs) containing a Hello World program similar to the one shown in the MPI.NET tutorial, such as:

using System;
using MPI;

class MPIHello
{
    static void Main(string[] args)
    {
        using (new MPI.Environment(ref args))
        {
           Console.Write("Greetings from node {0} of {1}\n",
                         Communicator.world.Rank,
                         Communicator.world.Size);
        }
    }
}

Copy MPI.dll and MPI.dll.config to this folder and compile under mono:
gmcs Hello.cs -reference:MPI.dll - target:exe -out:Hello.exe
Compilation should succeed without errors or warnings.

Run the resultant program using mpiexec to launch four threads:
mpiexec -np 4 ./Hello.exe
 If the resultant output is (the order of lines may vary), then everything is working as expected:
Greetings from node 0 of 4
Greetings from node 2 of 4
Greetings from node 1 of 4
Greetings from node 4 of 4
If all this worked, you should now have a working installation of MPI.NET and can use the MPI-2 framework to write parallel and distributed programs.

Further reading

Sunday, November 27, 2011

Rapid Bitmap Access using "Unsafe" Code (Pointers) in C#

Today we discuss efficient programmatic writing of Bitmaps in C#.

I've spent a fair amount of time developing computer-generated fractal art (link and link).  For example, this is one of my Newton fractals (which is derived from the time it takes for Newton's method to converge to a zero from various starting locations in the complex plane given a specified polynomial):
The Eye
One prerequisite is the ability to efficiently set pixels in a C# Bitmap object.  Today, I'll show a trick that I've used - that I'm sure I learned elsewhere, but have long since forgotten where - to speed up generating Bitmaps compared to a naive method.

A naive method to setting every pixel in the Bitmap is:

public static Bitmap drawNaive(Color[,] data)
{            
    int height = data.GetLength(0);
    int width = data.GetLength(1);
    
    Bitmap result = new Bitmap(width, height);
    for(int row = 0; row < height; row++)
    {
        for(int col = 0; col < width; col++)
        {
            Color color = data[row, col];
            
            //Set the pixel using the exposed method by the 
            // Bitmap class (slow!)
            result.SetPixel(col, row, color);
        }
    }
    
    return result;
}

However, this solution is relatively slow because it requires an expensive operation to lock and unlock the bitmap every time a pixel is set.  Must better performance can be had by using pointers (which is permitted when "unsafe" code is used).  Bitmaps are very simple objects that hold the intensity of Red, Green, Blue and the Alpha (transparency) channel at each pixel in four bytes (1 byte each) from the top-left corner, across each row, and then down the image to the bottom right corner.  Our strategy will be:
  1. Declare a struct representing a pixel
  2. Lock the bitmap once at the start
  3. Use pointers to scan through the image, updating pixels
  4. Unlock the bitmap once at the end
First we declare the struct for the pixel.  Note that the order of the fields matters and are chosen to match the way the Bitmap object's pixels are laid out in memory.

public struct PixelData
{
    public byte B;
    public byte G;
    public byte R;
    public byte A;
    
    public PixelData(byte r, byte g, byte b, byte a)
    {
        this.R = r;
        this.G = g;
        this.B = b;
        this.A = a;
    }
}

This can be used then to sweep along the Bitmap.

public static Bitmap drawPointer(Color[,] data)
{
    int height = data.GetLength(0);
    int width = data.GetLength(1);
    
    Bitmap result = new Bitmap(width, height);
    
    //Lock the entire bitmap (the rectangle argument) once
    BitmapData locked = result.LockBits(new Rectangle(0, 0,
                                        result.Width, result.Height),
                                        ImageLockMode.ReadWrite,
                                        PixelFormat.Format32bppArgb);
    
    //Enter unsafe mode so that we can use pointers
    unsafe
    {
        //Get a pointer to the beginning of the pixel data region
        //The upper-left corner
        PixelData* pixelPtr = (PixelData*)(void*)locked.Scan0;
        
        //Iterate through rows and columns
        for(int row = 0; row < height; row++)
        {
            for(int col = 0; col < width; col++)
            {
                Color color = data[row, col];
                
                //Set the pixel (fast!)
                *pixelPtr = new PixelData(color.R,
                                          color.G,
                                          color.B,
                                          color.A);
                
                //Update the pointer
                pixelPtr++;
            }
        }
    }
    
    //Unlock the bitmp
    result.UnlockBits(locked);
    
    return result;
}

This results in a significant speed-up.

Friday, November 25, 2011

Memoization of recursive functions in C#

Today we discuss strategies for memoization of recursive functions in C#.

Consider the problem of computing the n'th Fibbonaci number (1, 1, 2, 3, 5, 8, 13, ...).  This problem can cast as a recursive function, namely:
  • F(0) = 0
  • F(1) = 1
  • F(n) = F(n-2) + F(n-1) for all n > 1
This can be computed by a simple recursive function:

static BigInteger simpleFib(int n)
{
    if( n < 1 )
        return 0;
    if( n == 1 )
        return 1;
    
    return simpleFib(n-2) + simpleFib(n-1);
}

However, this function does not yield good performance. Because of the recursive step for computing any function n > 2, it re-calculates (expensively!) many calls. This problem can be solved with memoization: the program will keep track of the values for results of the function in a static cache (using the Dictionary<,> object).  This generates a significant speedup:

static Dictionary<int, BigInteger> cache = new Dictionary<int,BigInteger>();
//Set up the cache in Main() with cache[0] = 0 and cache[1] = 1;

static BigInteger recursiveFib(int n)
{
    //Return a value from the cache if available
    BigInteger result;
    if( cache.TryGetValue(n, out result) )
        return result;
    
    //Otherwise recursively calculate, add to cache and return
    result = recursiveFib(n-2) + recursiveFib(n-1);
    cache.Add(n, result);
    return result;
}

Although achieving good runtime performance, this function is limited by the size of the callstack.  We can shift the memory burden away from the callstack and on to the heap by converting the recursive call into a loop, keeping track of the arguments to the current 'recursive call' being executed by the loop using a Stack<int>.  The strategy will be to instead call Stack.Push(n) for values of N that need to be calculated and saved in the memoized cache.  There are three cases:
  1. The value is already in the memoized cache.  If so, continue with the next item on the stack immeditially.
  2. The value is not in the cache, but the recursive terms (F(n-2) and F(n-1)) are already known in the cache.  If so, calculate F(n) immeditially, save the the cache and continue with the next item on the stack.
  3. Either F(n-2) or F(n-1) (or both) is not known in the cache.  In that case, (a) push the current n back on to the stack so that it will be re-examined.  Then push n-2 and n-1 on to the stack, if their value is not known so that their value will be computed first (before the stack reaches n again).
When the loop terminates, the value of F(n) must be in the memoized cache, so it can be returned directly.  The code for this is:

static Dictionary<int, BigInteger> cache = new Dictionary<int,BigInteger>();
//Set up the cache in Main() with cache[0] = 0 and cache[1] = 1;

static BigInteger fib(int n)
{
    Stack<int> parameterStack = new Stack<int>();
    parameterStack.Push(n);

    while( parameterStack.Count > 0 )
    {
        int current = parameterStack.Pop();
    
        //Continue if already in the cache
        BigInteger result = 0;
        if( cache.TryGetValue(current, out result) )
            continue;
        
        //Are the fib(n-1) an fib(n-2) already in the cache?
        BigInteger oneResult;
        BigInteger twoResult;
        bool twoInCache = cache.TryGetValue(current - 2,
                                            out oneResult);
        bool oneInCache = cache.TryGetValue(current - 1,
                                            out twoResult);
        
        //If so, remember this result and return
        if( twoInCache && oneInCache )
        {
            cache.Add(current, oneResult + twoResult);
            continue;
        }
        
        //Otherwise, push the value back onto the stack 
        //(plus either n-2 and n-1) to calculate recursively
        parameterStack.Push(current);
        if( !oneInCache ) parameterStack.Push(current - 1);
        if( !twoInCache ) parameterStack.Push(current - 2);
    }
    
    //Return the value from the cache
    return cache[n];
}

This achieves the best memoized performance without triggering stack overflows.

Note that for the Fibonacci sequence there is a simple, trivial O(n) way to calculate it [i.e. starting from 1, 1, ... and generating the sequence up to n).  It has simply been used here as a simple example of a recursive function that can be computed with memoization.  There are other functions (such as the Ackermann function), for which memoization may be the best strategy to compute the result.