Ubuntu cluster setup for MPI parallel programming in Python

Instructions on how to set up an Ubuntu cluster can be found at https://help.ubuntu.com/community/MpichCluster .

I’ve updated a few outdated commands there myself so it shouldn’t be too hard to follow the instructions. The only thing I personally did differently was that I didn’t create a new user, but instead used my old account on all the machines (the important thing is that the username be the same everywhere).

In this post I’ll explain how to make a Python script to utilize this cluster using the MPI standard for parallel programming.

To prepare your Python interpreter for parallel programming, you first need some sort of an MPI interface. Several exist so it’s up to you to choose. I used mpi4py. This is part of the scipy module and it can be installed through Synaptic or with:

sudo apt-get install python-dev # other potential packages to consider - python-mpi mpichpython python-scipy python-numpy

Then you need to install the mpi4py module. Note: we won’t install mpi4py from the Ubuntu repo, because it depends on OpenMPI and we are using MPICH2. This can normally be done using the Python package index by installing pip and using it to install mpi4py:

sudo apt-get install pip
sudo pip install mpi4py

For me, howewer, the repository for mpi4py was unavailable, so I had to download the latest version from their google code home and install it following these instructions – basically it comes down to

sudo python setup.py install

To be able to run an mpi program you first need to boot the mpd (for example on 4 hosts):

mpdboot ­-n 4

Update: booting the mpd is no longer necessary in the new version of MPICH2, after the switch to the Hydra process manager.

You can then run your programs (in let’s say 6 instances – it doesn’t have to match the number of hosts, for the machinefile see https://help.ubuntu.com/community/MpichCluster):

mpiexec -n 6 -f machinefile python program.py

Here’s a sample Python program you can use to get started with mpi4py:

#!/usr/bin/env python

from mpi4py import MPI

comm = MPI.COMM_WORLD

print "Hello! I'm rank %d from %d running in total..." % (comm.rank, comm.size)

comm.Barrier() # wait for everybody to synchronize _here_

Published by

metakermit

Building apps, analysing data at Punk Rock Dev and sharing weird & cool photographs, drawings, music, films, games... More about me here. You can get new blog posts via RSS or follow @metakermit on Twitter where I also announce new stuff.

2 thoughts on “Ubuntu cluster setup for MPI parallel programming in Python”

  1. I’m using mpich1 and already pip installed mpi4py in my machines but I get the error:

    bash: /home/mpiuser/anaconda/bin/hydra_pmi_proxy: No such file or directory

    According to what I’ve read, I need to have hydra_pmi_proxy in all my machines but only one of my machines have anaconda. Is there another way to work around this other than copying the entire anaconda folder to all my machines?

Leave a Reply

Your email address will not be published. Required fields are marked *