Raspberry Pi Cluster Setup Detailed Guide
- Setting up a Raspberry Pi Cluster
- Command Terminals
- Materials
- Connecting the Pi-s
- Setting Up the Main Node
- Setting up the worker nodes
- A Quick Benchmark
- Wee Archie
Setting Up A Raspberry Pi Cluster
These instructions will help you build and set up a small cluster of Raspberry Pi computers to use as a small parallel system. The instructions take you through the process of connecting the computers together, setting up the environment and checking that it works correctly.
These instructions are not for complete beginners with computers - they assume knowledge of concepts such as networks, command line and file systems. There are some explanations in this document but still require some experience and a willingness to try things out.
Note: these instructions deal with a cluster of 5 Raspberry Pis but the cluster can be of any size you want (minimum of 2 as otherwise it is not a parallel computer).
Note: Raspbian refers to the version of Linux commonly distributed with the Raspberry Pi computers.
Note: This is an initial version of these instructions - some parts may need to be updated or changed to make them clearer. If you try these instructions and find any issues or have suggestions, submit an issue to Cluster Instructions.
Command TerminalsThis document will refer to command terminals throughout, this is how the Raspberry Pis are accessed, how you configure them and how you tell the Raspberry Pis what you want them to do. A command terminal provides a command line interface to a user. These often look like:
Sometime they will have light text on a dark background - you can configure them to your own liking. Regardless, a Raspberry Pi uses the Linux operating system and will have a terminal that functions and behaves in the same way this document describes.
If you are using Windows, you may need to install an SSH client to connect to the Raspberry Pi. An SSH client which is free is PuTTY - available from this link.
On a Linux system, you will need to open a command terminal on your computer - you may be able to get this using a right mouse click and choosing "Terminal", or from a list of programs in the program menu.
On Windows you will need to start your SSH client.
On a Mac OS machine, Terminal is under Utilities in the Applications folder.
Materials
Required
- 5 Raspberry Pis (Note the examples used in these instructions are Raspberry Pi 3s - the instructions should work for most Raspberry Pi models - if problems are encountered please submit an issue to Cluster Instructions).
- 5 Power Supplies/1 Multisupply Unit.
- 1 Switch (8 Ports).
- 6 Ethernet Cables.
- 5 USB Cables.
- Laptop or desktop PC with shareable ethernet connection.
Optional
Suitable and Unsuitable Raspberry Pi Models
NOTE:There are many products with the Raspberry Pi name so this is a list of the ones that are suitable and unsuitable for these instructions.
Suitable Raspberry Pi Models (these instructions work for these types of Raspberry Pi):
- Raspberry Pi 3
- Raspberry Pi 2
Unsuitable Raspberry Pi Models (these instruction will not work or require substantial changes and/or additional hardware):
- Raspberry Pi Zero
- Raspberry Pi Zero W
- Raspberry Pi Compute Module
Materials used in these instructions:
- 5 Raspberry Pi 3.
- 1 Multi-USB Power Socket. Note - This model appears to have been discontinued.
- 1 Netgear GS108E Switch (8 Ports).
- 6 Ethernet Cables.
- 5 USB Cables (A-Type USB Male to B-Type Micro-USB Male).
- 1 Macbook with Wifi internet connection shared via Ethernet.
- 5 Red Lego-Style Raspberry Pi cases.
Important Note - Only for New Raspbian Installations
Due to changes in the recent versions of Raspbian and the NOOBS distributions, to follow these instructions you have to make a change to your Raspberry Pi SD Card before you boot it up for the first time.
The newer versions of Raspbian have SSH access disabled by default. You need to reenable it to follow the instructions below.
This can be done by inserting the microSD cards containing Raspbian into your computer and creating a blank text file called SSH in the boot partition. This will be the disk that appears in your file system when you insert the microSD card.
Once you have created the file, you will be able to boot Raspbian with SSH remote access enabled.
The Hardware
At the time of writing there are three types of Raspberry Pi in the market - to distinguish between them we provide the following images: If you are using cases - this is the point where you should put the Raspberry Pi into its case. An example of a cased Raspberry Pi would look like this:Connecting the Raspberry Pi Cluster
Connect the Ethernet Cables to the ethernet ports: Connect the ethernet cables to the switch: Now it should look something like: Connect your laptop or desktop to the switch via an ethernet cable. (Note that while Raspberry Pi 3 has a builtin wireless network - we are not using this and at this point it probably would not have been set up.) Now all the network cables are connected - we need to put the power cables in. First connect the power cable for the switch: Then turn on the switch. Now connect the micro usb end of the usb cables to each Raspberry Pi: Connect the other end to your power supply, in this case we are using an externally powered USB-hub: Now plug in the power supply and turn it on. All the Pi units should have LED lights on or flashing. This should all look something like this: Now we have put all the hardware connections in, it is time to start configuring the software.Network Software Setup
Note: These instructions assume you are using a computer connected to a WiFi network and that this computer has a free Ethernet port to connect to the cluster.
First on your laptop/desktop you will have to configure the internet sharing:
Mac OS
- Go to System Perferences
- Choose Sharing
- Click on Internet Sharing and choose share from Wifi to Ethernet
This will start a service which shares the connection from the WiFi across the Ethernet connection (through the switch) to the Raspberry Pis.
Windows 10
Share Network Connection
- Open Settings
- Choose Network and Internet
- Choose "Network and Sharing Center
- Choose "Change Adapter Settings"
- Select the Wi-Fi connection (this maybe called something else on your computer but should be identifiable as being connected to a wireless network by a signal strength in the icon.)
- Right Click and Choose "Properties"
- Select the Sharing Tab
- Under Sharing, click the box at "Allow other network users to connect through this computer's Internet Connection.
- Ensure that the Home networking connection is your Ethernet network in the box below.
- Your internet is now shared but you will need to use Nmap to get the attached IP addresses for the Raspberry Pis.
- In the Network Connections where you selected the Wi-Fi connection, there should be a Ethernet connections, right click on it and choose Properties.
- In the list headed "This connection uses the following items", click on the one called "Internet Protocal Version 4"
- A Properties button will enable just below the list - click on it.
- Take a note of the IP Address and Subnet Mask listed.
- You can close these dialog boxes.
- Now click on this link:Windows IP Address Discovery to get instructions on how to get the Raspberry Pi IP addresses.
Linux TODO
Need to fill in
Getting into the Pi
Finding out the IP addresses of your Raspberry Pis. For the setup used here, the shared network connection showed up as an adapter called Bridge100 after running ifconfig. This is what allows the WiFi connection to be shared to the Ethernet connection.
Note: the Bridge100 may be using some other name on your computer. To check which interface is created - type ifconfig before turning on your sharing or plugging in your adapter then run it again afterwards to find out what difference the changes made.
For a MacOS computer you need to find out the Bridge100 IP address. This is the most likely interface you are using for ethernet although some systems will vary.
Open a command terminal:Type: ifconfig
This should put out something like:lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 options=3<RXCSUM,TXCSUM> inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff000000 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 nd6 options=1<PERFORMNUD> gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280 stf0: flags=0<> mtu 1280 en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 ether a4:5e:60:e7:21:0f inet6 fe80::a65e:60ff:fee7:210f%en0 prefixlen 64 scopeid 0x4 inet 172.20.152.14 netmask 0xfffff000 broadcast 172.20.159.255 nd6 options=1<PERFORMNUD> media: autoselect status: active en1: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500 options=60<TSO4,TSO6> ether 6a:00:00:41:c6:f0 media: autoselect <full-duplex> status: inactive en2: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500 options=60<TSO4,TSO6> ether 6a:00:00:41:c6:f1 media: autoselect <full-duplex> status: inactive bridge0: flags=8822<BROADCAST,SMART,SIMPLEX,MULTICAST> mtu 1500 options=63<RXCSUM,TXCSUM,TSO4,TSO6> ether a6:5e:60:7e:1d:00 Configuration: id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0 maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200 root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0 ipfilter disabled flags 0x2 member: en1 flags=3<LEARNING,DISCOVER> ifmaxaddr 0 port 5 priority 0 path cost 0 member: en2 flags=3<LEARNING,DISCOVER> ifmaxaddr 0 port 6 priority 0 path cost 0 media: <unknown type> status: inactive p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304 ether 06:5e:60:e7:21:0f media: autoselect status: inactive awdl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1484 ether ca:90:43:20:19:ba inet6 fe80::c890:43ff:fe20:19ba%awdl0 prefixlen 64 scopeid 0x9 nd6 options=1<PERFORMNUD> media: autoselect status: active vboxnet0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 ether 0a:00:27:00:00:00 inet 192.168.59.3 netmask 0xffffff00 broadcast 192.168.59.255 vboxnet1: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 ether 0a:00:27:00:00:01You are most likely to be looking for the bridge100 interface:
bridge100: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 options=3<RXCSUM,TXCSUM> ether a6:5e:60:7e:1d:64 inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255 inet6 fe80::a45e:60ff:fe7e:1d64%bridge100 prefixlen 64 scopeid 0xe Configuration: id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0 maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200 root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0 ipfilter disabled flags 0x2 member: en4 flags=3<LEARNING,DISCOVER> ifmaxaddr 0 port 10 priority 0 path cost 0 Address cache: nd6 options=1<PERFORMNUD> media: <unknown type > status: inactiveThis is the IP address 192.168.2.1 (line beginning with inet above.)
There are two ways to find out what is connected: the first is to run the command arp.
arp -i bridge100 -aThis runs a program which returns information about what is connected currently to that interface. Command Breakdown:
arp - program to run -i value - use the interface value -a - display all current connectionsWhen you run this command you will get output which contains lines like:
? (239.255.255.250) at 1:0:5e:7f:ff:fa on bridge100 ifscope permanent [ethernet]
Running the arp program will give you a list of connected IP addresses on your Bridge100 interface, this list of addresses is the list of Raspberry Pi addresses - there should be the same number of addresses as you have Raspberry Pis connected.
The other way is to install a program like nmap, which is available from https://nmap.org/.
To run nmap to scan on your network you can do it in two ways:
- Scan a range of IP addresses:
nmap 192.168.2.1-255
Use this option if you know what range your connection sharing will use; replace the IP address range with your own values. - Scan a subnet on the network:
nmap 192.168.2.0/24
Use this option if you only know the start of the IP address range (e.g. 192.168.2) and it will scan over the whole subnet.
Windows
Install Nmap (available at https://nmap.org/) if you are using Windows - this is the easiest way to determine the IP addresses. Using the Ethernet IP address you noted before when sharing the Internet Connection, run the command, replacing the IP address with yours but change the last number to 0:
c:\nmap 192.168.137.0/24
This should give output like:
C:\>nmap 192.168.137.0/24 Starting Nmap 7.40 ( https://nmap.org ) at 2017-04-26 15:53 GMT Daylight Time Nmap scan report for 192.168.137.225 Host is up (0.00013s latency). All 1000 scanned ports on 192.168.137.225 are closed MAC Address: 98:5A:EB:C7:15:8B (Apple) Nmap scan report for 192.168.137.1 Host is up (0.0013s latency). Not shown: 997 closed ports PORT STATE SERVICE 135/tcp open msrpc 139/tcp open netbios-ssn 445/tcp open microsoft-ds Nmap done: 256 IP addresses (2 hosts up) scanned in 23.48 seconds C:\>
All the IP addresses which show up starting with Nmap scan report for IP address will be your Raspberry Pis - except for the one with your written down IP address.
Now we know what IP addresses are assigned to the Pis.
IP addresses are what we can use to identify a Pi on the network and use it as a target to log in - its the same as a street address when going somewhere - though IP addresses can change. A brief explanation of IP Addresses and DHCP are available at the following links:
Now we have our Pis started, we know their current IP addresses - now we should start doing some configuration and setup. Note: These IP addresses are only valid with the current connection to your computer. After the setup is complete, the Raspberry Pis will be assigned addresses by the cluster server.
For example, my Pi addresses are:
192.168.2.18 192.168.2.19 192.168.2.20 192.168.2.21 192.168.2.22
I am going to make the Pi with address 192.168.2.18 the master node in the cluster and the other Pis will be worker nodes.
The master node in our clusters will run the network assigning IP addresses via the DHCP server, share its Wifi connection (if used), share a networked drive for files and start the programs we will run on the cluster.
The worker nodes will do the actual computation in any programs.
Setting Up the Main Node
We will start with the master node.
In your command terminal type the following command, replacing the 192.168.2.18 with your IP address for your Pi:
ssh pi@192.168.2.18
You will get output like this coming up:
The authenticity of host '192.168.2.18 (192.168.2.18)' can't be established. RSA key fingerprint is a0:97:1f:d6:df:8c:a5:45:db:fe:b4:94:46:1d:0f:48. Are you sure you want to continue connecting (yes/no)?
Type yes in and the following will appear:
Warning: Permanently added '192.168.2.18' (RSA) to the list of known hosts. pi@192.168.2.18's password:
Type in the password: raspberry
Now you have logged into your new master node for your cluster.
To breakdown what happened, you typed in
ssh pi@192.168.2.18
This is a command which runs the ssh client on your computer and tries to connect to a system at address 192.168.2.18 with the user name pi. This is a standard way of logging into a remote system - like a supercomputer such as ARCHER.
You will then get output like:
The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Fri Feb 26 03:25:51 2016 pi@raspberrypi:~ $
Let's do some main system configuration:
Type the command sudo raspi-config and press Enter/Return, e.g.:
pi@raspberrypi:~ $ sudo raspi-config
This will run a provided Raspberry Pi configuration program.
The sudo command means that any following commands will be run as the super-user which has permissions to make system changes rather than just changes in a user own files.
When a menu appears:
- Choose option 3 Boot Options.
- Now choose option B1 Text console and hit enter/return.
This will mean that the Raspberry Pi will always boot to a command line rather than a GUI (that is a Graphical User Interface which is what we normally use in everyday use of computers).
- Now choose the option 9 Advanced Options.
- Choose the Option A2 Hostname and hit enter/return.
An information screen will appear - read it before continuing.
On this screen, there is a hostname, an easy to recognise name, for the Raspberry Pi you are logged into. To make it more identifiable and customise your cluster, you can choose a name for the Raspberry Pi you are logged into.
In my example I will change it to beira and hit enter/return to confirm it.
We need to set the right time zone so that the date and time on the Pi is correct.
- Choose Option 5 Internationalisation Options.
- Choose the T2 Change Timezone option.
- From the list choose your area, for me it is Europe.
- Then choose your closest location, I chose London.
Once back in the main menu, use the cursor/arrow keys to move to the button which says Finish and hit enter/return.
It will ask if you want to reboot - choose yes and reboot this Pi.
Your ssh session with the Raspberry Pi will end. The Raspberry Pi will take a little time to reboot before you can log in again to continue. Have patience here.
Now log back into the Raspberry Pi using the ssh command previously used and put in the password.
Now you will note that the command line says pi@your-hostname-here so my example shows
pi@beira:~ $
We will now install some updates on the Raspberry Pi main software. The first command to run is rpi-update.
pi@beira:~ $ sudo rpi-update
This will update the Raspberry Pi with the latest version of the core software and firmware for the system - it will take a few minutes and start with output like:
*** Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom *** Performing self-update *** Relaunching after update *** Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom ############################################################# WARNING: This update bumps to rpi-4.9.y linux tree Be aware there could be compatibility issues with some drivers Discussion here: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=167934 ############################################################## Would you like to proceed? (y/N)
If you do not want to perform this update, choose N, otherwise type y. You will need to reboot the Raspberry Pi for this to take effect.
pi@beira:~ $ sudo reboot
This will take a little time.
So far what has been done:
- The hardware and cables have been connected together.
- We shared our internet connection.
- We chose a master node and logged into it.
- We chose a hostname and updated the core software.
Now log back into the node you are working on.
Before we run any more software updates or install new software we will begin to setup the passwordless login system that the cluster will need to run programs.
At the command line run the following:
pi@beira:~ $ ssh-keygen -t rsa
This will generate what is called a public/private key pair - this is a system used for encryption and security to allow information to be shared in private. This will generate a public file which can be given to any other machine or person you want to communicate with and a private file which has to be kept secret so that other people know that only you sent information to them or the information they send to you using your public key can only be read by you.
Use the default settings for the first prompt and don’t put in a passphrase for this cluster (just press return/enter when you are asked for a pass phrase). Your interaction should look a little like:
pi@beira:~ $ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/pi/.ssh/id_rsa): Created directory '/home/pi/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/pi/.ssh/id_rsa. Your public key has been saved in /home/pi/.ssh/id_rsa.pub. The key fingerprint is: 9b:98:c7:86:17:0a:1e:32:95:65:ee:1c:0f:48:48:ef pi@beira The key's randomart image is: +---[RSA 2048]----+ | .... o | | .o * | | = + | | o o + | | o E o S | | + o * + | | . = B | | + | | | +-----------------+ pi@beira:~ $
You’ve set up a ssh key - remember how to do this as you will be doing it again later. Now we will update some of the operating system and installed software, run the command:
sudo apt-get update
This will pull information about the latest versions of packages available for the Raspberry Pi operating system. It will take about 30-50 seconds to run.
Now we will run:
sudo apt-get upgrade
This will upgrade the installed software to its latest versions - which should include bug fixes and improvements.
When the prompt asks enter ‘Y’ and it will install the upgrades.
Depending on the number of upgrades it could take a long period of time to run this command.
Once the update has finished, we will start to set up the Network File System (NFS) server component.
First thing we want to do is to add a network module, so run the following command:
sudo nano /etc/modules
This should open up a file like this:
at the end of the file, using the cursor keys to move, if it is not present, add the following:
ipv6
- Now press the keys CRTL and O at the same time - this will save the file.
- Now press the keys CRTL and X at the same time - this will close nano.
Note: nano is a simple text editor program that we will be using during this configuration process.
Run the command:
sudo service rpcbind start
Now we will install the NFS Kernel server - this is needed to run the server for the NFS drive.
sudo apt-get install nfs-kernel-server
Now we will start to create the actual shared location. Start by creating a directory in /home for sharing
sudo mkdir -p /home/shared_dir
mkdir is a command used to create a directory (which can contain other directories and files).
Now change its access permissions so that everyone can read or write to this directory:
sudo chmod 777 /home/shared_dir
chmod is a command to change the access permissions on a file or directory. Here 777 means that everyone can read, write or execute commands in that directory. If you only want the owner to be able to write and others to only read you would use 644. This link will take you to the Wikipedia chmod entry.
Now we need to mount the shared_dir and bind it.
sudo mount --bind /home/shared_dir/ /home/shared_dir/
This means that the system recognises this as something which is a mountable drive in the system - so that it can share it to other computers.
Now we need to make sure the directory gets mounted and exported each time when the system boots up and that other computers can access it.
Run the following command:
sudo nano /etc/fstab
at the end of the file add the following:
/home/shared_dir /home/shared_dir none bind 0 0
- Now press the keys CRTL and O at the same time - this will save the file.
- Now press the keys CRTL and X at the same time - this will close nano.
To continue the configuration, run:
sudo nano /etc/default/nfs-kernel-server
Make sure that one of the following options is present in the opened file:
NEED_SVCGSSD=no
or
NEED_SVCGSSD=“no”
or
NEED_SVCGSSD=“”
or
NEED_SVCGSSD=
- Now press the keys CRTL and O at the same time - this will save the file.
- Now press the keys CRTL and X at the same time - this will close nano.
The configuration process continues with a mapping file, run:
sudo nano /etc/idmapd.conf
Ensure that under the [Mapping] section the following is present:
[Mapping] Nobody-User = nobody Nobody-Group = nogroup
Now run the following to expose the new directory to the network:
sudo nano /etc/exports
At the end of the file add the following to export your shared_dir to the network:
/home/shared_dir 192.168.2.0/24(rw,nohide,insecure,no_subtree_check,async)
This line will expose your shared_dir to the network where all the addresses start 192.168.2 and allow read and write acess.
As a final check on this we will examine the following files:
/etc/init.d/nfs-kernel-server /etc/init.d/nfs-common /etc/init.d/rpcbind
Have a look at each of the above files in nano. Each one, near the top of the file, should have a line:
# Default-Start: 2 3 4 5
however in some this may say:
# Default-Start: S
You need to change all the files that have an S to 2 3 4 5 and then run:
sudo update-rc.d -f rpcbind remove sudo update-rc.d rpcbind defaults sudo update-rc.d -f nfs-common remove sudo update-rc.d nfs-common defaults sudo update-rc.d -f nfs-kernel-server remove sudo update-rc.d nfs-kernel-server defaults
If any of these commands fail with error messages, try the following:
sudo apt-get purge rpcbind sudo apt-get install nfs-kernel-server
and try to do the above update again for any of the operations that failed.
Before we go into setting up a DHCP server we will setup MPI. MPI is a core technology used in parallel and high performance computing that allows processes to communicate in a standard way and thus allows them to collaborate to solve a problems together in parallel. We will install and test a simple program on a single node.
Run the following:
sudo apt-get install libxml2-dev sudo apt-get install zlib1g zlib1g-dev sudo apt-get install mpich
Now try running the command:
mpiexec --version
You should get output looking something like:
pi@beira:~ $ mpiexec --version HYDRA build details: Version: 3.1 Release Date: Thu Feb 20 11:41:13 CST 2014 CC: gcc -D_FORTIFY_SOURCE=2 -g -O2 ... CXX: g++ -D_FORTIFY_SOURCE=2 -g -O2 ... F77: gfortran -g -O2 ... F90: gfortran -g -O2 ... Configure options: ... Process Manager: pmi Launchers available: ssh rsh fork slurm ll ... Topology libraries available: hwloc Resource management kernels available: user slurm ll lsf sge pbs cobalt Checkpointing libraries available: blcr Demux engines available: poll select
Note: The output above has been truncated so when the command is run the output will have more details.
Now we will write a small program to test that MPI works on our one Raspberry Pi.
Type the following commands which will change us to the shared_dir directory, where we will create a new directory, change into that directory, create a text file with the code content, compile the resulting program and run it.
cd /home/shared_dir mkdir testprogram cd testprogram nano hello.c
Type the following:
#include <mpi.h> #include <stdio.h> int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); // Get the rank of the process int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); // Get the name of the processor char processor_name[MPI_MAX_PROCESSOR_NAME]; int name_len; MPI_Get_processor_name(processor_name, &name_len); // Print off a hello world message printf("Hello world from processor %s, rank %d" " out of %d processors\n", processor_name, world_rank, world_size); // Finalize the MPI environment. MPI_Finalize(); }
Now save the file and exit nano.
Type the following to compile the code:
mpicc -o hello hello.c
Type the command:
ls -al
This should show the directory contents - which will include an executable file called hello.
Now we will run this as an MPI program.
First we need to create a hostfile:
nano hostfile
where you need to type in the address of your Raspberry Pi followed by a ::4", for example I need to type in:
192.168.2.18:4
Now save the file and exit nano.
To run the hello program type in the command:
mpiexec -n 4 -f --hostfile hostfile ./hello
The mpiexec command will run the program hello as an MPI program across 4 cores on the systems identified in your hostfile. This means that the program will run as four processes which can communicate with each other - in this case each process will only write out which core on the Raspberry Pi it is working on.
You should get output like this:
pi@beira:/home/shared_dir/testprogram $ mpiexec -n 4 -f hostfile ./hello Hello world from processor beira, rank 0 out of 4 processors Hello world from processor beira, rank 1 out of 4 processors Hello world from processor beira, rank 2 out of 4 processors Hello world from processor beira, rank 3 out of 4 processors
If you see this output like this then congratulations - you have setup MPI on your system and run a basic MPI program on it. This is the first step in checking that things are working correctly.
To enable Wifi on this main node, you need to add a network to the wpa_supplicant.conf file.
Type the command:
sudo nano /etc/wpa_supplicant/wpa_supplicant.conf
at the end of the file add:
network={ ssid=“<your ssid>“ psk=“<your wifi code>“ }
Note: The SSID is the WiFi network identifier - this is an identifier broadcast to allow you to select the right WiFi network. If you are running a home network - this will likely be on your router. Otherwise you can check to see what WiFi network your main computer is connected to from the WiFi network properties.
As an example:
network={ ssid="SKY64DBE" psk="UDAXEESQ" }
If you have a system like eduroam please see the blog article: Eduroam Networking.
You may need to restart the wifi connection on your Raspberry Pi. Run the command:
sudo ifdown wlan0
Wait for a few second then run:
sudo ifup wlan0
if you run the command:
ifconfig wlan0
You should get the output for the wifi with an IP address. Sharing this connection will be done after the next part.
Now what we need to do is setup a DHCP server on this node - this will serve IP addresses to any other machine that wants to connect to its network and asks for one.
First we will install dnsmasq, to do this run the command:
sudo apt-get install dnsmasq
Once it is installed, we will need to configure the dhcp server, type:
sudo nano /etc/dnsmasq.conf
This will open the configuration file for dnsmasq.
Press CRTL and W which will start a search, type in domain-needed and press enter/return.
Remove the # from the beginning of that line (a # is a comment and tells the program to ignore the rest of the line, good for documenting your own changes).
Also remove the # from the line that has bogus-priv.
Find the line beginning server=/ (ignore lines starting with #) and change the line to read:
server=/cluster/<ip address of your main node>
so, for example I would use:
server=/cluster/192.168.2.18
Now find the line starting with local=/ and change it to:
local=/cluster/
Search for expand-hosts.
Remove the # from the beginning of that line.
Goto the line starting:#domain=
and change this to:
domain=cluster
Now goto the first line: #dhcp-range=
Remove the # and change it to:
dhcp-range=192.168.2.30,192.168.2.100,14d
This will set the range of IP addresses that can be assigned to client machines (192.168.2.30-192.168.2.100) and for how long they are assigned those addresses before they are renewed, known as lease time, in this case 14 days.
Save this file now and exit the editor.
Run the following command:
sudo nano /etc/resolv.conf
At the end of the file add nameserver <ip address of your main node>, for example:
nameserver 192.168.2.18
Now save and exit.
Then run:
sudo nano /etc/hosts
at the end of the file add <ip address of your main node> <hostname of your main node>, for example:
192.168.2.18 beira
Then save and exit.
Now we will fix the IP address of this Raspberry Pi - so that it is a fixed point we will always know.
There are two tested versions of Raspbian for these instructions - "Wheezy" and "Jessie". New installations of Raspbian should be "Jessie" - commonly Raspberry Pi 3 computers, older Raspberry Pi 1 and 2 will if not upgraded likely be running "Wheezy". Use the instructions for your particular installation.
Network Configuration: Raspberry Pi "Wheezy"
<<<<<<< HEAD=======
There are two tested versions of Raspbian for these instructions - "Wheezy" and "Jessie". New installations of Raspbian should be "Jessie" - commonly Raspberry Pi 3 computers, older Raspberry Pi 1 and 2 will, if not upgraded, likely be running "Wheezy". Use the instructions for your particular installation.
>>>>>>> 6b79860ef9df268aa649fc8127c9878cb1b5a24b For Raspberry Pi running the Raspbian version denoted "Wheezy" (commonly non-upgraded Raspberry Pi 1 and 2) do the following:
sudo nano /etc/network/interfaces
in that file there will be a line:
iface eth0 net manual
we will change this to use a static IP address. replace the line with
iface eth0 inet static address 192.168.2.18 netmask 255.255.255.0 network 192.168.3.0 broadcast 192.168.2.255 gateway 192.168.2.1
Note: Use the IP Address of your node in place of my example 192.168.2.18.
Save this file and exit now.
Now start the service and reboot:
sudo service dnsmasq start sudo reboot
Then log back into the Raspberry Pi.
Network Configuration: Raspberry Pi "Jessie"
For Raspberry Pi computers running Raspbian version denoted "Jessie" (commonly Raspberry Pi 3), run:sudo nano /etc/dhcpcd.conf
At the bottom of the file add:
interface eth0 static ip_address=192.168.2.18/24
Now start the service and reboot:
sudo service dnsmasq start sudo reboot
Then log back into the Raspberry Pi.
On All Systems
If you disable the internet sharing on your laptop/desktop you will find that you get an IP address from the Pi on the ethernet.
If you have configured wifi to connect to your internet, you can share this across the network being served by the Raspberry Pi via its ethernet connection.
On the host Raspberry Pi run:
sudo nano /proc/sys/net/ipv4/ip_forward
Change the value 0 to 1, then save and exit.
Then run:
sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
This will share the connection across any machines getting IP addresses from this Pi.
However if you reboot the Pi, this sharing will be lost.
What we need to do is make the iptables persistent and the forwarding permanent.
To do this run:
sudo nano /etc/sysctl.conf
For the line #net.ipv4.ip_forward=1 remove the #.
Now run the following commands to make the iptables changes load each time:
sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE sudo apt-get install iptables-persistent
When it asks you if you want to save the current configuration, say yes both times.
This will add the wifi sharing as a permanent feature of your cluster.
Now you have made changes to your main node which will do the following:
- Share a network drive.
- Run a DHCP Server.
- Run an MPI program.
- Share Internet Connections.
Setting up the worker nodes
Now what we need to do is configure the worker nodes.
First we need to know what addresses they have - if you are continuing to use internet sharing from a laptop/desktop use the instructions from earlier.
If you are sharing from your new master node, then you can find out the ip addresses like this:
less /var/lib/misc/dnsmasq.leases
This will list the connected machines to the master node.
One of these will be your laptop/desktop.
1471486087 b8:27:eb:1d:52:f6 192.168.2.55 raspberrypi 01:b8:27:eb:1d:52:f6 1471486084 b8:27:eb:47:a1:27 192.168.2.39 * 01:b8:27:eb:47:a1:27 1471486081 b8:27:eb:cf:a3:29 192.168.2.62 * 01:b8:27:eb:cf:a3:29 1471486081 b8:27:eb:0c:4d:4e 192.168.2.72 * 01:b8:27:eb:0c:4d:4e 1471482976 34:15:9e:07:d4:42 192.168.2.47 Khonsu 01:34:15:9e:07:d4:42
You will notice some of the lines have a "*" - this is because all the worker nodes have the same hostname which doesn’t really work on a network. Your laptop or desktop should have a name, in my example it is Khonsu.
We are going to do a manual configuration change to fix this, but we could write a script to run on all our nodes so that we don’t have to type a lot ourselves. If this is not the first time you have created a cluster, you can create a script, if it is the first time it may be more reassuring to do it yourself.
So on each node you need to login:
ssh pi@<address>
You will get the ssh certificate warning which you can ignore, then you will need to type your password in.
Once logged in - you will need to run the raspi-config again and choose the same settings options as you did for the master node - just with a different hostname and remember to set the timezone - see the main system configuration. Here is the list of raspi-config steps:
- Choose option 3 Boot Options.
- Now choose option B1 Text console and hit enter/return.
- Now choose the option 9 Advanced Options.
- Choose the Option A2 Hostname and hit enter/return.
- Set Hostname
- Choose Option 5 Internationalisation Options.
- Choose the T2 Change Timezone option.
- From the list choose your area, for me it is Europe
- Then choose your closest location, I chose London.
For my example, I will just call them worker01-worker04 so if I look at the leases in dnsmasq:
1471486724 b8:27:eb:1d:52:f6 192.168.2.55 worker01 01:b8:27:eb:1d:52:f6 1471486758 b8:27:eb:47:a1:27 192.168.2.39 worker02 01:b8:27:eb:47:a1:27 1471486815 b8:27:eb:cf:a3:29 192.168.2.62 worker03 01:b8:27:eb:cf:a3:29 1471486841 b8:27:eb:0c:4d:4e 192.168.2.72 worker04 01:b8:27:eb:0c:4d:4e 1471482976 34:15:9e:07:d4:42 192.168.2.47 Khonsu 01:34:15:9e:07:d4:42
I can login to the nodes with their IP Address now if the configuration of the server has worked correctly.
You we will need to generate an ssh key with ssh-keygen for each machine.
Now you will have four workers each with their own ssh keys.
Now you will have to update them all.
You can open up multiple terminals and on each worker run the following command:
sudo rpi-update
This means all the workers are updating at the same time.
Once the update has completed - you will need to reboot.
To save a lot of typing - what we will do is set up passwordless login within the cluster - we need this for parallel programs to work and it saves us time as well.
First we will set up passwordless login from the master node to the worker node.
Log into the master node and then for each of your workers run the following:
ssh-copy-id pi@<worker node>
Each time you will need to enter a password.
Now you will need to do this on each of the workers - doing an ssh-copy-id to all the other nodes.
Now that will take a little time to type in but it will save time in the future.
On each worker run:
sudo nano /etc/hosts
At the end of the file add:
<ip address of your server node> <name of your server>
for my example:
192.168.2.18 beira
Now we need to update the OS on each worker.
Run the command:
sudo apt-get update && sudo apt-get upgrade
You will need to confirm on each worker that you want the install to continue.
This will take some time to install even if they are all going at the same time.
After they are updated we need to install some additional packages:
sudo apt-get install libxml2-dev sudo apt-get install zlib1g zlib1g-dev sudo apt-get install mpich
Now you should be able to run:
mpiexec —version
However, we are not quite finished with the software setup. We need to setup the mounted shared drive.
Start on each node by doing this:
sudo mkdir /home/shared_dir sudo chmod 777 /home/shared_dir/
ON each node run:
sudo apt-get purge rpcbind sudo apt-get install nfs-common
You will need to do this on each node:
/etc/init.d/nfs-common /etc/init.d/rpcbind
Have a look at the above files in nano.
Each file should have a line near the top with:
# Default-Start: 2 3 4 5
However, as before, in some it may have:
# Default-Start: S
Change all the S to 2 3 4 5.
Then run:
sudo update-rc.d -f rpcbind remove sudo update-rc.d rpcbind defaults sudo update-rc.d -f nfs-common remove sudo update-rc.d nfs-common defaults
We are going to address the shared nfs drive by its IP address:
sudo mount 192.168.2.18:/home/shared_dir /home/shared_dir
This will only mount the drive a single time - when you reboot the Pi the changes will be lost.
So we need to add it to some configuration that will do this automatically, run:
sudo nano /etc/fstab
at the end of the file add:
192.168.2.18:/home/shared_dir /home/shared_dir nfs rw,hard,intr,noauto,x-systemd.automount 0 0
Now you can reboot everything so on each node run:
sudo reboot
Now we will test our simple hello world across multiple nodes - login to the master node:
cd /home/shared_dir/testprogram nano hostfile
in the hostile add the IP addresses of your nodes (not the ones in the example unless they match up) with a :4 at the end like this:
192.168.2.18:4 192.168.2.72:4 192.168.2.55:4 192.168.2.39:4 192.168.2.62:4
Save and exit nano.
Run the following command:
mpiexec -n 20 -f hostfile ./hello
and it should give output like:
Hello world from processor worker03, rank 16 out of 20 processors Hello world from processor worker03, rank 17 out of 20 processors Hello world from processor worker03, rank 18 out of 20 processors Hello world from processor worker03, rank 19 out of 20 processors Hello world from processor worker04, rank 4 out of 20 processors Hello world from processor worker04, rank 5 out of 20 processors Hello world from processor worker01, rank 8 out of 20 processors Hello world from processor worker04, rank 6 out of 20 processors Hello world from processor worker01, rank 9 out of 20 processors Hello world from processor worker04, rank 7 out of 20 processors Hello world from processor worker01, rank 10 out of 20 processors Hello world from processor worker01, rank 11 out of 20 processors Hello world from processor worker02, rank 12 out of 20 processors Hello world from processor worker02, rank 13 out of 20 processors Hello world from processor worker02, rank 14 out of 20 processors Hello world from processor worker02, rank 15 out of 20 processors Hello world from processor beira, rank 1 out of 20 processors Hello world from processor beira, rank 3 out of 20 processors Hello world from processor beira, rank 0 out of 20 processors Hello world from processor beira, rank 2 out of 20 processors
Congratulations, if you got this then you have put together a basic cluster for running parallel programs on.
To turn off a Raspberry Pi, type:
sudo halt
To see how fast your cluster is, the next instruction set will show you how to install a benchmark and how to run it.
Running a benchmark
Linpack is a commonly used benchmark to test how “fast” a supercomputer can go. What we are going to do is install linpack on your cluster and test how fast your cluster is.
The first thing to do is to log into your cluster and connect it to the internet.
We are going to run the HPL - High Performance Linpack Benchmark on your cluster, more information on this can be found at Linpack.
Software Setup:
On each node, we need some additional software so run the following command on each of the nodes:
sudo apt-get install libatlas-base-dev gfortran
Now use the following command to change to the shared drive:
cd /home/shared_dir
Now, create a linpack directory:
mkdir linpack cd linpack
Now download the HPL source:
wget http://www.netlib.org/benchmark/hpl/hpl-2.2.tar.gz
This file is a compressed archive to make it quicker to download, we will now extract the archive:
tar xf hpl-2.2.tar.gz
Change into the new directory and into the setup directory:
cd hpl-2.2/setup
Run the following command:
sh make_generic
Change to the parent directory:
cd ..
Copy the Make.UNKNOWN file and call it Make.rpi.
cp setup/Make.UNKOWN Make.rpi
Now we will edit the file:
nano Make.rpi
Press CRTL+W and search for ARCH.
The line should say ARCH = UNKNOWN.
Change unknown to rpi
Search for TOPdir.
Change the value for TOPdir to:/home/shared_dir/linpack/hpl-2.2
We could set the first part of this to an environment variable but that is another topic.
A few lines below you will find three blank entries Mkdir, MPinc, MPlib.
Set these to the values shown below:
MPdir = /usr/lib MPinc = -I $(MPdir)/mpich/include MPlib = $(MPdir)/arm-linux-gnueabihf/libmpich.a LAdir = /usr/lib/atlas-base/ LAinc = $(LAdir)/libf77blas.a $(LAdir)/libatlas.a LALib = -lblas
Save the file and exit nano, then type the command:
make arch=rpi
This will compile the benchmark.
Now we need to create a data file to test with:
cd bin/rpi nano HPL.dat
Now enter the following text into the HPL.dat file:
HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 5040 Ns 1 # of NBs 128 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 1 Ps 1 Qs 16.0 threshold 1 # of panel fact 2 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0)
Save the file and exit nano.
Then in the bin/rpi directory run:
./xhpl
This will produce output like:
================================================================================ HPLinpack 2.2 -- High-Performance Linpack benchmark -- February 24, 2016 Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK Modified by Julien Langou, University of Colorado Denver ================================================================================ An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system. The following parameter values will be used: N : 5040 NB : 128 PMAP : Row-major process mapping P : 1 Q : 1 PFACT : Right NBMIN : 4 NDIV : 2 RFACT : Crout BCAST : 1ringM DEPTH : 1 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words -------------------------------------------------------------------------------- - The matrix A is randomly generated for each test. - The following scaled residual check will be computed: ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N ) - The relative machine precision (eps) is taken to be 1.110223e-16 - Computational tests pass if scaled residuals are less than 16.0
and this will take some time to run.
Then in the end it will produce output like:
================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR11C2R4 5040 128 1 1 317.11 2.693e-01 HPL_pdgesv() start time Thu Aug 18 09:28:47 2016 HPL_pdgesv() end time Thu Aug 18 09:34:04 2016 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0021492 ...... PASSED ================================================================================ Finished 1 tests with the following results: 1 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. -------------------------------------------------------------------------------- End of Tests. ================================================================================
The numbers will vary but this result about shows that a single Raspberry Pi 3 achieves 269.3Mflops (see under the Gflops column at the top).
Now what we will do is run LINPACK across our cluster.
The first thing you need to do is to create a hostfile with the ip addresses of your nodes.
192.168.2.18:4 192.168.2.72:4 192.168.2.55:4 192.168.2.39:4 192.168.2.62:4
Now you will need to edit your HPL.dat:
HPLinpack benchmark input file Innovative Computing Laboratory, University of Tennessee HPL.out output file name (if any) 6 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 17400 Ns 1 # of NBs 128 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 4 Ps 8 Qs 16.0 threshold 1 # of panel fact 2 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0)
Then run the program using mpiexec:
mpiexec -n 20 -f hostfile ./xhpl
This will output information like the last run just for the whole cluster.
For my example cluster a test run came out 2.403Gflops.
Your speeds will vary. This is the default setup for Linpack on this cluster - we have been exploring ways of getting better performance - for example we can change some of the software libraries in the benchmark for more efficient ones, this has proven to give far better performance numbers.
Further instructions for tweaking your cluster and installing demos from EPCC and ARCHER will be appearing in the future. Monitor our blog at https://www.epcc.ed.ac.uk/blog and Twitter feed at https://twitter.com/EPCCed for related announcements. And track this page for updates at Wee Archlet HTMLand the Git repository.
Note: If the network connections appear to break connection (error message comes up) or freeze (network becomes unresponsive), add the following line to the file /etc/sysctl.conf:
vm.min_free_kbytes = 16384
0 Comments