I presented last Feb 23 a paper documenting the commissioning of our new Beowulf cluster. The 8th Philippine Computing Congress was held in University of the Philippines–Diliman. My presentation was held at the new Computer Science building (beside the EEE building).
Here is a short description of the paper:
Title: Building and Benchmarking a New Beowulf Cluster for Grid Computing and Other Applications
Authors: Allan Espinosa and Rafael Saldana
In this paper, we report our upgrade of our AGILA Beowulf cluster. Commodity desktop computers were used for the compute nodes. The server node was set-up as a high-end server class machine to house terabytes of data from the universityâ€™s scientific computing applications such as cellular automata, molecular dynamics, mesoscale climate modeling, and computational models requiring high performance computing infrastructures.
Embedded below is my presentation. Enjoy!
Technorati tags: Beowulf, computing, clusters
Published 2008 2月 15
タグ: clustering, linux
The vanilla install of Rockscluster 4.3 uses version 2.6.9-55EL of the linux kernel. Native support for the Realtek 8111B (r8168) did not come until 2.6.19.xx. I downloaded 188.8.131.52 from kernel.org. After rebuilding the kernel, you have to enable the kernel to map the hardware ID of the device to the correct module (r8169). Here is an archive of the files that I used to build the driver:
rocks-boot-drivers.tar.gz: I added the r8168 directory and modified the subdirs file to build this module for the kernel. It actually does not build anything since there are no entries in the SOURCE variable of the Makefile. Extract this tarball to your Rocks CVS tree ($ROCKS-SRC-ROOT/src/roll/kernel/src/rocks-boot/enterprise/4/images) The following entry was added to drivers/r8168/:
0x10ec 0x8168 "r8169" "RealTek RTL8168B/8111B, RTL8168C/8111C Gigabit Ethernet controller
Where 0x10ec 0x8168 is the hardware ID of my GigE controller.
Then I followed the instructions Creating a Custom Kernel RPM and Adding a Device Driver of the User Guide.
Good luck in building your cluster!
Published 2008 1月 11
タグ: clustering, linux
I am currently building a new Beowulf cluster using Rocks 4.3. It uses the linux kernel version 2.6.9-55EL. After building the master node, it is time to install on the compute nodes. In normal conditions where everything is smooth, the Rocks kickstart system boots the compute nodes from the network via a dhcp-tftp-kickstart combination. But our Rocks cannot load the network driver. Upon identification of the driver, I downloaded the Realtek 8111B (r8168) from the vendor’s site. I followed the instructions on how to add a custom device driver to the kernel in the rocks documentation. It basically creates an initrd.img file where the kernel modules is installed. But the boot sequence does not load the kernel module properly. I had a couple of email exchanges with Greg Bruno, one of Rock’s developers over the mailing list to diagnose the problem. But building a custom kernel module for the current kernel have not solved the missing module.
Upon further investigation, the driver for my NIC was incorporated into the r8169 module of the vanilla kernel. So I downloaded kernel version 184.108.40.206. First the kernel*.rpm packages must be built and installed in the /home/install rocks repository. Next, the rocks-boot package should be rebuilt in order to incorporate the new kernel. But since Centos uses an older version of the kernel, the hardware id of my NIC is not associated to the r8169 kernel module. So I created a dummy device driver in the rocks-boot repository. In the Makefile, I removed the source file to be compiled and simply added in an entry for my driver in the pcimap.
Now my compute nodes was able to grab the kickstart file and install an entire operating system in 10 minutes!