2010-12-29

Howto make your private VM Cluster, Part III

Continuing with my saga, next up is DRBD. I'm using DRBD because I also want to test it as a viable alternative for network raid.

To use DRBD I first created a LVM volume to use:
lvcreate -n drbd-demo -L 100M internal-hd
Then I configured DRBD on both nodes, fortunately gentoo simplifies a great part of the process (you have to do this on both nodes):
cd /etc
cp /usr/share/doc/drbd-*/drbd.conf.bz2 .
bunzip2 drbd.conf.bz2
Then I created a wwwdata resource by first configuring it (again on both nodes). This is done by creating a file /etc/drbd.d/wwwdata.res with the contents:
resource wwwdata {
    meta-disk internal;
    device    /dev/drbd1;
    syncer {
        verify-alg sha1;
    }
    net {
        allow-two-primaries;
    }
    on node1 {
        disk    /dev/mapper/internalhd-drbd--demo;
        address 192.168.100.10:7789;
    }
    on node2 {
        disk    /dev/mapper/internalhd-drbd--demo;
        address 192.168.100.11:7789;
    }
}
I added the drbd module to the /etc/modules.autoload.d/kernel-2.6 on both nodes.

Finally I started drbd on node 1 as follows:
drbdadm create-md www-data
modprobe drbd
drbdadm up wwwdata
And on node 2 as follows:
drbdadm --force create-md www-data
modprobe drbd
drbdadm up wwwdata
I then used node 1 as reference for the data:
drbdadm -- --overwrite-data-of-peer primary wwwdata
I monitored the sync process until it was completed with:
watch cat /proc/drbd
When completed I created the file system and populated it with an index.html file indicating the cluster:
mkfs.ext4 /dev/drbd1
mount /dev/drbd1 /mnt
# Create a index.html
umount /dev/drbd1
I configured the cluster to use drbd as follows (this will enter the crm shell but don't panic):
crm
cib new drbd
configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata op monitor interval=30s
configure ms WebDataClone WebData meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
cib commit drbd
quit
After this I configured a WebFS service so the lighttpd will serve from the DRBD mounted volume.
crm
cib new webfs
configure primitive WebFS ocf:heartbeat:Filesystem params device="/dev/drbd/by-res/wwwdata" directory="/var/www/localhost/htdocs" fstype="ext4"
configure colocation WebFS-on-WebData inf: WebFS WebDataClone:Master
configure order WebFS-after-WebData inf: WebDataClone:promote WebFS:start
configure colocation WebSite-with-WebFS inf: WebSite WebFS
configure order WebSite-after-WebFS inf: WebFS WebSite
cib commit webfs
quit
After this, if you go to your cluster web page (http://192.168.100.20) you will see the contents of the index.html that you created for the cluster.

You can use "crm_mon" to monitor the cluster and look at /var/log/message to view error messages. To simulate a full service relocation go the the node were the services are running and issue "crm node standby" this will put the current node on standby forcing the services to be moved to the other node. After that you can do "crm node online" to bring the node back online.

This concludes this series. Maybe I'll put up another on to have nfs use the drbd, depends on free time.

Howto make your private VM Cluster, Part II

In the previous entry I showed how to build the basic structure for your own private VM Cluster. Today I'm going to show you how to create a cluster with two VMs that provides High-Availability for a WebServer using DRBD to replicate the Web Site.

First you need to create a virtual machine. I decided to create a VM with Gentoo. I will use LVM to keep the partition to use for drbd small since this is a simple test.

I created a qemu-img for a base gentoo installation (my goal is to install gentoo on the VM and then reuse it as base for the other VMs). To create the image just run:
qemu-img create -f qcow2 gentoo.qcow2 10G
I started the VM using that image and followed Gentoo's installation guide. My partition scheme was 100Mb (boot), 512Mb (swap), 5Gb (root), rest for lvm.

I used the gentoo-sources, configuring all virtio devices, drbd and the device-mapper. I configured genkernel to use lvm so it detects the lvm volumes at boot. I used grub and added all the genkernel options.

Remember that if you use the minimal installation CD the hard disk will be called sda but after setting up virtio in the kernel it will be called vda. I created a generic startvm script that will provide the disk and network card using virtio. I called it startVM:
#!/bin/bash
set -x

if [[ $# != 2 && $# != 3 ]]; then
    echo "Usage: startVM <mac> <hda> [cdrom]"
    exit 1
fi

# Get the location of the scriptSCRIPT=`readlink -f $0`
SCRIPT_PATH=`dirname $SCRIPT`

# Create tap interface so that the script /etc/qemu-ifup can bridge it# before qemu startsUSERID=`whoami`
IFACE=`sudo tunctl -b -u $USERID`

# Setup KVM parametersCPUS="-smp 8"
MEMORY="-m 1G"
MACADDRESS=$1
HDA=$2
if [[ $# == 3 ]]; then
    CDROM="-cdrom $3"
else
    CDROM=""
fi
NET="-net nic -net tap,script=/etc/qemu-ifup"

# Start kvmkvm $CPUS $MEMORY -drive file=$HDA,if=virtio,boot=on $CDROM -net nic,model=virtio,macaddr=$MACADDRESS -net tap,ifname=$IFACE,script=$SCRIPT_PATH/qemu-ifup

# kvm has stopped - remove tap tap interfacesudo tunctl -d $IFACE &> /dev/null
After successfully booting to the Gentoo VM I halted the VM to create another disk image. I may want to reuse this vanilla Gentoo VM in the future so I created another disk image taking this vanilla one as base as follows:
qemu-img create -f qcow2 -o backing_file=gentoo.qcow2 gentoo-drbd.qcow2
Then I started the VM with the new script with:
startVM DE:AD:BE:EF:E3:1D gentoo-drbd.qcow2
The MAC Address will be important later.

Now I installed all the packages that I will be needing for each node in the cluster. The goal is to avoid having to build them many times and to reuse this image for all nodes. Basic steps are as follows:
emerge -av telnet-bsd drbd lvm2 pacemaker lighttpd
eselect python set python2.6
pvcreate /dev/vda4
vgcreate internalhd /dev/vda4
I changed python to 2.6 because pacemaker requires it. I added the following to /etc/hosts to avoid doing it everyware:
192.168.100.10 node1
192.168.100.11 node2
192.168.100.20 cluster
Next I created an image for the two nodes as follows:
qemu-img create -f qcow2 -o backing_file=gentoo-drbd.qcow2 gentoo-drbd-node1.qcow2
cp gentoo-drbd-node1.qcow2 gentoo-drbd-node2.qcow2
Then I created to scripts, one to start each VM. The contents are as follows (for node2 you must change the mac address):
#!/bin/bash
set -x

# Get the location of the scriptSCRIPT=`readlink -f $0`
SCRIPT_PATH=`dirname $SCRIPT`

startVM DE:AD:BE:EF:E3:1D gentoo-drbd-node1.qcow2
I then configured static IPs for each node, that is: edit /etc/conf.d/hostname to be either node1 or node2 and the contents of /etc/conf.d/net to be:
config_eth0=( "192.168.100.10/24" )
routes_eth0=( "default via 192.168.100.1" )
For node 2 the IP ends in 11. Next I confired corosync. This must be done on both nodes:
cd /etc/corosync
cp corosync.conf.example to corosync.conf
Edit the corosync.conf file and make the bindnetaddr be the IP address of the node. And add the pacemaker service by adding the following to the end of the file:
service {
   name: pacemaker
   ver: 0
}
I started corosync on both nodes and marked it to start on boot:
/etc/init.d/corosync start
rc-update add corosync default
Then I proceded to congiure the cluster. First I turned of STONITH:
crm configure property stonith-enabled=false
Then I created the Virtual IP for the Cluster:
crm configure primitive ClusterIP ocf:hertbeat:IPaddr2 params ip=192.168.100.20 cidr_netmask=32 op monitor interval=30s
Marked the cluster to runt with two nodes (without quorom, please don't discuss this in the comments) and for resource stickiness (to avoid having the resources move around if not needed):
crm configure property no-quorum-policy=ignore
crm configure rsc_defaults resource-stickiness=100
I added lighttpd as a service. First I created index.html on both nodes and different so that I can check if things are working. Next I created the service in the cluster:
crm configure primtive WebSite lsb:lighttpd op monitor interval=30s
crm configure colocation website-with-ip INFINITY: WebSite ClusterIP
crm configure order lighttpd-after-ip mandatory: ClusterIP WebSite

You can test and access the cluster address (http://192.168.100.20) to see whose answering. You can stop the corosync service to view the service migrate between nodes.

Next up DRBD.

2010-12-28

Howto make your private VM Cluster, Part I

I wanted to make some experiments with DRBD, pacemaker and others. The goal is to test a few configurations for an high availability scenario. Since I don't want to make changes to my machine the solution is to use Virtual Machines. I decided to go all open source and use KVM. Since I want more that a single VM I decided to setup a private bridge to which all the VMs would connect. The bridge would provide DHCP and DNS services and NAT + Firewall to the internet.

I use Gentoo. If you use another distribution or firewall tool you should adapt these instructions and scripts to fit your needs.

First I created a script that sets up my bridge. The script is as follows (I called it setupBridge):
#!/bin/bash
set -x

# Setup the bridge
sudo brctl addbr br0
sudo ifconfig br0 192.168.100.1 netmask 255.255.255.0 up

# Launch DNS and DHCP Server on br0
sudo dnsmasq -q -a 192.168.100.1 --dhcp-range=192.168.100.50,192.168.100.150,forever --pid-file=/tmp/br0-dnsmasq.pid

# Launch updated firewall configuration
sudo firehol /home/nsousa/KVM/firehol-br0.conf start
The bridge is created and I give it the 192.168.100.1 ip address. I use dnsmasq to provide DNS and DHCP. The DHCP provided addresses will be from 192.168.100.50 to 150. Finally I adapt the firewall using firehol. Here is the firehold configuration file (firehol-br0.conf):
version 5

# Allow all traffic in the Bridge
interface br0 bridge
server all accept
client all accept

# Accept all client traffic on any interface
interface any world
client all accept

# NAT to the internet
router bridge2internet inface eth0 outface br0
masquerade reverse
client all accept

# Bridge Routing
router bridge-routing inface br0 outface br0
server all accept
client all accept
This basically allows the VMs to connect to any service running on br0 (the host). It allows any one to connect to anyone as client (so the host can go to the internet). Finally it sets up NAT using masquerade so the VMs can use the bridge to access the internet, but as clients only.

To undo all these changes I have a script that tears all this setup down. I called it teardownBridge and here are its contents:
#!/bin/bash
set -x

# Stop DHCP and DNS Server
sudo kill -15 `cat /tmp/br0-dnsmasq.pid`
sudo rm /tmp/br0-dnsmasq.pid

# Stop the bridge
sudo ifconfig br0 down
sudo brctl delbr br0

# Reset the firewall
sudo firehol /etc/firehol/firehol.conf start
Before moving to the script that starts one VM I first need to show the script that sets up the interface for qemu to use (I called it qemu-ifup):
#!/bin/bash
set -x

if test $(/sbin/ifconfig | grep -c $1) -gt 0; then
sudo /sbin/brctl delif br0 $1
sudo /sbin/ifconfig $1 down
fi

sudo /sbin/ifconfig $1 0.0.0.0 promisc up
sudo /sbin/brctl addif br0 $1
The goal here is to add the tap device created to the bridge. I first remove it if it is already there.

Last but not least the script that starts a VM. This script sets up a tap device for the current user, configures KVM start-up parameters and starts KVM. When KVM ends it removes the tap device to keep the system clean. Here is the contents of the script adapted to use a hard disk and the minimal install cd for gentoo (I called it startVM):
#!/bin/bash
set -x

# Create tap interface so that the script /etc/qemu-ifup can bridge it
# before qemu starts
USERID=`whoami`
IFACE=`sudo tunctl -b -u $USERID`

# Setup KVM parameters
MEMORY="-m 512"
IMGPATH="/home/nsousa/KVM"
IMAGE=gentoo.qcow2
CDROM="-cdrom /home/nsousa/Downloads/install-x86-minimal-20101123.iso"
MACADDRESS="DE:AD:BE:EF:5D:A3"
NET="-net nic -net tap,script=/etc/qemu-ifup"

# Start kvm
kvm $MEMORY -hda $IMGPATH/$IMAGE $CDROM -net nic,macaddr=$MACADDRESS -net tap,ifname=$IFACE,script=/home/nsousa/KVM/qemu-ifup

# kvm has stopped - remove tap tap interface
sudo tunctl -d $IFACE &> /dev/null
Next step is to make a minimal gentoo installation on a VM to use as base for all the VMs in the cluster. I'll probably refactor the startVM script to separate the common parts (the private parts will be the disk image to use and MAC Address).

Stay tuned for more updates.