XenServer multipath configuration for LIO targets

XenServer multipath.conf with special support for LIO-based iSCSI targets to maximize multipath performance and ensures 100% stability. path_grouping_policy setting doesn’t seem to matter (between group_by_prio and multibus) in most basic setups. Invalid lines (as reported by XenServer 6.1) have also been removed.

http://djlab.com/stuff/xs61/multipath.conf

Specifically:

        device {
                vendor "LIO-ORG"
                product "*"
                path_grouping_policy group_by_prio
                path_checker tur
                getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
                prio_callout "/sbin/mpath_prio_alua /dev/%n"
                path_selector "round-robin 0"
                hardware_handler "1 alua"
                rr_weight uniform
                rr_min_io 2
                failback immediate
        }

Migrate a XenServer VM without a Pool or Shared Storage

Source is now in github, please help me make it better! If you just want a 32-bit binary (to run on Dom0), download it from here instead.

With the help of Ben Booth’s Xen::API (Perl module) I put together a VM migration script to export a VM directly to another host with no intermediary file. The transfer occurs over XAPI with no temp files or local disk interaction. This script can run directly on the source or destination host, or any server in between. Bear in mind, you will have the best speeds and least network overhead running this directly on the destination host.

As of today, MigrateVM has been tested and works fine on XenServer 5.6 through 6.5.

Options:

-sh : source host
-su : source user (usually root)
-sp : source pass
-sv : source VM label or UUID
-dh : destination host
-du : destination user
-dp : destination pass
-ds : destination SR (optional)

If any of the options are omitted, you will be prompted for them.

Example output:

[root@cl-ash-h1 ~]# ./migratevm
Enter source host name/IP (blank = localhost): 1.2.3.4
Enter username for 1.2.3.4 (blank = root):
Enter password for 1.2.3.4: ************
Enter source vm name or uuid on 1.2.3.4: my_vm
Enter destination host name/IP (blank = localhost):
Enter username for localhost (blank = root):
Enter password for localhost: ******
Destination SR on localhost (blank for default):
Starting transfer
...................    12.0%, 30618.43 (KB/sec)
Done.

Download the script like this:

wget http://djlab.com/stuff/migratevm-1.0.2.tar.gz
tar zxf migratevm-1.0.2.tar.gz && cd migratevm-1.0.2
./migratevm

If you get ‘bad ELF’ or something like that on a 64 bit system, try to install 32-bit glibc, for example:

Older XenServers: yum install glibc.i686
XenServer 6.5:  yum install glibc.i686 --enablerepo=base --enablerepo=updates --disablerepo=citrix

Binary and source are both included in the tarball.

Version 1.0.2 has an updated binary build which should now run on XenServer 6.5. We needed to static link expat into the binary because it is no longer installed by default on XS 6.5.

X9SCM / X9SCL Network Timeout

Supermicro X9SCM and X9SCL main boards will lose network connection after some heavy traffic, especially on RHEL/CentOS 6. Updating BIOS and driver will not always fix this:

Oct 19 18:32:49 zeus kernel: ------------[ cut here ]------------
Oct 19 18:32:49 zeus kernel: WARNING: at net/sched/sch_generic.c:267 dev_watchdog+0x26d/0x280() (Not tainted)
Oct 19 18:32:49 zeus kernel: Hardware name: X9SCL/X9SCM
Oct 19 18:32:49 zeus kernel: NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
Oct 19 18:32:49 zeus kernel: Modules linked in: vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nfs lockd fscache nfs_acl auth_rpcgss sunrpc nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length vhost_net xt_hl xt_tcpmss macvtap xt_TCPMSS macvlan iptable_mangle iptable_filter xt_multiport xt_limit tun xt_dscp ipt_REJECT ip_tables kvm_intel kvm vzevent ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse snd_pcsp snd_pcm snd_timer video i2c_i801 tpm_tis tpm tpm_bios serio_raw i2c_core snd shpchp output soundcore snd_page_alloc ext3 jbd mbcache ahci e1000e [last unloaded: scsi_wait_scan]
Oct 19 18:32:49 zeus kernel: Pid: 4, comm: ksoftirqd/0 veid: 0 Not tainted 2.6.32-15-pve #1
Oct 19 18:32:49 zeus kernel: Call Trace:
Oct 19 18:32:49 zeus kernel: <IRQ> [<ffffffff8106c608>] ? warn_slowpath_common+0x88/0xc0
Oct 19 18:32:49 zeus kernel: [<ffffffff8106c6f6>] ? warn_slowpath_fmt+0x46/0x50
Oct 19 18:32:49 zeus kernel: [<ffffffff8147c6fd>] ? dev_watchdog+0x26d/0x280
Oct 19 18:32:49 zeus kernel: [<ffffffff8107fcac>] ? run_timer_softirq+0x1bc/0x380
Oct 19 18:32:49 zeus kernel: [<ffffffff8147c490>] ? dev_watchdog+0x0/0x280
Oct 19 18:32:49 zeus kernel: [<ffffffff81075413>] ? __do_softirq+0x103/0x260
Oct 19 18:32:49 zeus kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30

So far, the only fix that has worked for me is doing two things.

1. Patch the NIC firmware (do this for both eth0 and eth1):

Download Here

2. Then add the following kernel parameter in your grub.conf (or menu.lst depending on OS flavor):

pcie_aspm=off

Now reboot. You shouldn’t lose your network again.

Kickstart 4-disk RAID10 Recipe

Here’s a nice recipe for a RAID10 array comprised of 4x SSD disks. Tested to work on CentOS 6 (RHEL 6). Be sure to add the discard option in fstab for Trim support.

zerombr yes
bootloader --location=partition --driveorder=sda,sdb,sdc,sdd
clearpart --all --initlabel --drives=sda,sdb,sdc,sdd
part raid.100000 --size=250 --ondisk=sda
part raid.100001 --size=250 --ondisk=sdb
part raid.100002 --size=250 --ondisk=sdc
part raid.100003 --size=250 --ondisk=sdd
part raid.100007 --size=1 --grow --ondisk=sdd
part raid.100006 --size=1 --grow --ondisk=sdc
part raid.100005 --size=1 --grow --ondisk=sdb
part raid.100004 --size=1 --grow --ondisk=sda
raid /boot --fstype ext3 --level=RAID1 --device=md0 raid.100000 raid.100001 raid.100002 raid.100003
raid pv.100008 --fstype "physical volume (LVM)" --level=RAID10 --device=md1 raid.100004 raid.100005 raid.100006 raid.100007
volgroup vg --pesize=65536 pv.100008
logvol swap --fstype swap --name=SystemSwap --vgname=vg --size=4096
logvol / --fstype ext4 --name=SystemRoot --vgname=vg --size=1 --grow

Xen 4 and Libvirt From Source on CentOS 6

Install some prerequisites:

yum groupinstall "Development Libraries" "Development Tools"

yum install mercurial python-devel dev86 iasl ncurses-devel ncurses \
glib2-devel glib2 openssl-devel yajl-devel libuuid-devel libuuid \
pciutils-devel pciutils texinfo kernel-xen bridge-utils  gnutls gnutls-devel \
libxml2 libxml2-devel libnl libnl-devel libxslt libxslt-devel pygtk2 xorg-x11-xauth \
xorg-x11-fonts* device-mapper* gnome-python2-gconf pygtk2-libglade dbus-x11 \
gtk-vnc-python netcf netcf-devel netcf-libs vte vte-devel

Pull the source code and build Xen.

cd /usr/src
hg clone -r RELEASE-4.1.2 http://xenbits.xensource.com/xen-4.1-testing.hg
cd xen-4.1-testing.hg/
make dist -j4
make install

Build and install Libvirt management tools.

cd /usr/src
wget http://libvirt.org/sources/libvirt-0.9.12.tar.gz
tar -zxf libvirt-0.9.12.tar.gz
cd libvirt-0.9.12/
./configure --prefix=/usr
make -j4
make install
ldconfig

cd /usr/src
wget http://virt-manager.org/download/sources/virtinst/virtinst-0.600.1.tar.gz
tar -zxf virtinst-0.600.1.tar.gz
cd virtinst-0.600.1/
python setup.py install

cd /usr/src
wget http://virt-manager.org/download/sources/virt-manager/virt-manager-0.9.1.tar.gz
tar -zxf virt-manager-0.9.1.tar.gz
cd virt-manager-0.9.1/

Install the xen-enabled Dom0 kernel:

yum install http://au1.mirror.crc.id.au/repo/kernel-xen-release-6-3.noarch.rpm
yum install kernel-xen

Edit /etc/grub.conf, make changes to the first ‘xen.gz’ line and change the next two lines to start with ‘module’.

       kernel /xen.gz dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1 dom0_vcpus_pin
        module /vmlinuz-2.6.32.57-2.el6xen.x86_64 ro root=UUID=efff8fe3-523b-4620-a01f-d948cd43c49a rd_MD_UUID=836f9712:2e50a8a6:b1eabaa6:19f7ff34 rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 quiet SYSFONT=latarcyrheb-sun16 rhgb crashkernel=auto rd_MD_UUID=0698abc6:b72a8b69:f1e3b4e8:2a9fc55f rd_NO_LVM rd_NO_DM rhgb quiet
        module /initramfs-2.6.32.57-2.el6xen.x86_64.img

Setup the init scripts.

chkconfig --add xencommons
chkconfig --add xend
chkconfig --add xen-watchdog
chkconfig --add libvirtd
chkconfig --add libvirt-domains
chkconfig xencommons on
chkconfig xend on
chkconfig xen-watchdog on
chkconfig libvirtd on
chkconfig libvirt-domains on

Now it’s time to reboot, and manage your domains with virt-install and/or virt-manager.

Some issues that I haven’t been able to solve.

1. CentOS paravirtual domains hang during bootup with Kudzu due to the VNC framebuffer. They also fail to poweroff after shutdown and hang with 100% CPU usage. This doesn’t appear to be a libvirt specific issue; I could replicate it with pure Xen as well. To work around this, disable VNC framebuffer using virt-install with –nographics as follows.

virt-install -n centos -r 2048 --file /dev/vg0/centos --os-variant=rhel6 -\
-nographics -p -l http://mirror.fastserv.com/centos/6/os/x86_64/ -b virbr0 -d

Unfortunately, now virt-manager doesn’t know how to access text console. You have to use virsh [domain] console from the command line from now on.

2. Libvirt Xen driver does not support managedsave and ends up terminating the DomU ungracefully when Dom0 reboots. If you happen to have DomU on a MD RAID-backed LVM, this will crash Dom0 with a kernel oops as MD attempts to go read-only with domains still attached. If anyone knows a workaround I am keen to this. Until then, I really can’t use this setup in any production environment.

3. Trying to work around this, if you change /etc/sysconfig/libvirt-domains to shutdown the DomU’s instead of trying to (unsuccessfully) save them, libvirt attempts to shutdown Domain-0 and hangs the shutdown process until the timeout (default 300 seconds) is reached.

4. CentOS 6.2 seems to have a buggy e1000e driver (at least when used on an X9SCL+-F motherboard) and at one point went completely offline requiring a hard power cycle. Research reveals I’m not the only one with issues with this combination.

My final thoughts are that Xen+Libvirt are certainly not a production ready combination. Every time I thought I solved a problem I uncovered several more and finally gave up after (3). Unfortunately, I don’t have enough time to work these bugs out and had to use Ubuntu+KVM in a crunch to get things done.