Instructions on how to set up an Ubuntu cluster can be found at https://help.ubuntu.com/community/MpichCluster .
I’ve updated a few outdated commands there myself so it shouldn’t be too hard to follow the instructions. The only thing I personally did differently was that I didn’t create a new user, but instead used my old account on all the machines (the important thing is that the username be the same everywhere).
In this post I’ll explain how to make a Python script to utilize this cluster using the MPI standard for parallel programming.
To prepare your Python interpreter for parallel programming, you first need some sort of an MPI interface. Several exist so it’s up to you to choose. I used mpi4py. This is part of the scipy module and it can be installed through Synaptic or with:
sudo apt-get install python-dev # other potential packages to consider - python-mpi mpichpython python-scipy python-numpy
Then you need to install the mpi4py module. Note: we won’t install mpi4py from the Ubuntu repo, because it depends on OpenMPI and we are using MPICH2. This can normally be done using the Python package index by installing pip and using it to install mpi4py:
sudo apt-get install pip sudo pip install mpi4py
For me, howewer, the repository for mpi4py was unavailable, so I had to download the latest version from their google code home and install it following these instructions – basically it comes down to
sudo python setup.py install
To be able to run an mpi program you first need to boot the mpd (for example on 4 hosts):
mpdboot -n 4
Update: booting the mpd is no longer necessary in the new version of MPICH2, after the switch to the Hydra process manager.
You can then run your programs (in let’s say 6 instances – it doesn’t have to match the number of hosts, for the machinefile see https://help.ubuntu.com/community/MpichCluster):
mpiexec -n 6 -f machinefile python program.py
Here’s a sample Python program you can use to get started with mpi4py:
#!/usr/bin/env python from mpi4py import MPI comm = MPI.COMM_WORLD print "Hello! I'm rank %d from %d running in total..." % (comm.rank, comm.size) comm.Barrier() # wait for everybody to synchronize _here_