Apr 19, 2007

Clustering ISO ready

After a grueling effort the ISO image for the installation of "Aventurin{e} - Clustering" was finished today.

Posted by: mstauber

The work started on last Saturday and I honestly believed that it would be ready by late Monday afternoon.  Instead of just a couple of hours it took five days and 90 work hours instead. Part of the problem was that the post install scripts which set up the actual clustering have to be pretty sophisticated. At every step and turn you have to verify if the transaction went through just fine, if you've already been there, or if there is an error.

So this required a lot of testing, which meant:

  1. Update the post install scripts
  2. Rebuild the RPM that contains the post install scripts
  3. Build a new ISO image
  4. Install the ISO image on two test servers
  5. Boot the test servers and perform the post install procedure

I had two test rigs that I could install the ISO's on:

Two Tyan GS12 with SATA-RAID and two VMware virtual machines (with SCSI-RAID) on Windows XP.

Installing the ISO image on VMware of course goes slightly faster than installing them on "real" servers, as you don't have to burn the actual CD-ROMs. You just can tell VMware to boot from the created ISO image.

But still: Steps two to five usually took around 30 minutes each time. That's a long wait to test just a couple of lines of new code <sigh>.

I didn't count how often I went through these steps, but it feels like I did it a hundred times.

Once the ISO checked out fine on VMware I went ahead and tried to install it on the "real" servers. But the result was quite a let down: One server wouldn't even boot after the OS install.

After some debugging I discovered a weird glitch in Anaconda (the RedHat installer) which happens only on systems with software RAID, which use GRUB, have a stand alone /boot partition and are rebooted before the RAID has synchronized the /boot partition. It took a couple of hours to troubleshoot and fix that.

Next issue was that the intended partitioning scheme as defined in the ISO's kickstart files wasn't working out all that well. Under certain circumstances the partitioning made it difficult to determine which partition should be clustered by Aventurin{e}. So I had to redo all kickstart files again with a much more stricter definition on what to partition which way.

Finally, in the early morning hours of Thursday I had it just right. It installed just perfect on VMware either with software RAID or single SATA/SCSI. It also worked flawlessly on the Tyan GS12's. During Friday I reinstalled it a couple of times (with and without RAID) on the test boxes and got flawless installs each time around.

So by now the ISO image for Aventurin{e} - Clustering is now pretty solidly wrapped up.

