Tag Archives: High Availability

LVM: disable udev sync to avoid “udevd timeout: killing watershed” error

While I was working on the setup of a simple cluster with LVM running on top of DRBD and managed by Pacemaker/Corosync I had a problem with LVM resources not coming up after a reboot.

The cluster was running on Ubuntu 12.04 (3.5.0-36 kernel), LVM logical volumes were used for data storage only (mapped to iSCSI target LUNs) while the whole system was running on physical disks.

The message logged by the Heartbeat OCF resource agent was “ERROR: LVM: MyVolumeGroup did not activate correctly” and the cluster status stuck in this way:

Resource Group: MyCluster
     LVM_Group     (ocf::heartbeat:LVM):   Started MYHOSTNAME (unmanaged) FAILED

To manually resolve the situation I had to deactivate volume groups and restart the cluster manager every time:

MYHOSTNAME:~# vgchange -a n 
MYHOSTNAME:~# service corosync restart

Further logs investigation led me to find some udevd errors:

udevd[2083]: timeout: killing 'watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'' [4934]
 udevd[2083]: 'watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'' [4934] terminated by signal 9 (Killed)

Something was going wrong in the LVM/udev synchronisation, resulting in the deadlock or failure of resource manager, so I decided to bypass it by setting udev_sync parameter to 0 (zero) in the /etc/lvm/lvm.conf:

[...]
    activation {
    udev_sync = 0   # please read further - do it at your own risk
    [...]
}

This solution worked very well, LVM resources started coming up in the proper way after reboot and, still now, they are up & running and fully manageable by Pacemaker.

A couple of searches on the web showed me Ubuntu Bug #995645, that could be somehow related to my case. Of course, if LVM had been used for system storage many other problems could arise due to the lack of sync with udev, but that was not my case.

At the time of writing the bug is still confirmed but unassigned.