Alike SR cannot be replugged


Problem: ABD jobs, such as Xen backups, hang, and your Alike SR shows as unplugged. ABD deployment operations may also receive the Xen error: SR_BACKEND_FAILURE_47

Solution: Forcibly unmount the A3 NFS share from the Xen console.


Your A3 will automatically deploy a special SR to your Xen pools called the Alike SR. This SR helps perform critical functions like backup and restore.


Should your A3 go offline or become powered off, this SR will likely show as unplugged in XenCenter. In most cases such, your A3 will automatically repair and replug it when it is first needed again for a job. But under some circumstances, such as power loss to your A3, your A3 won't be able to repair it.


The solution is to log into your Xen host dom0 console and type

mount |grep your.A3.IP.here

This will return entries like:

mount | grep 192.168.2.74

192.168.2.74:/xen/1ac6808f-0005-43bd-bb1c-d90f154f526d on /run/sr-mount/1ac6808f-0005-43bd-bb1c-d90f154f526d type nfs4 (rw,relatime,vers=4.1,rsize=262144,wsize=262144,namlen=255,acdirmin=0,acdirmax=0,soft,proto=tcp,timeo=100,retrans=3...


Copy the highlighted area and paste it into the command below:

umount -f <your-path-here>

Ex: umount -f /run/sr-mount/1ac6808f-0005-43bd-bb1c-d90f154f526d

This will forcibly disconnect this NFS link to your A3 VM or Docker host. You will need to repeat this operation on all other members of the pool. Once this is complete, you can return to XenCenter and uplug the SR and forget it. You can re-run an ABD diagnostic job and/or ABD deploy to replug the A3 SR.

If you receive the error "Target is busy" or similar when running the umount command, this means Xen is hung trying to communicate with your SR. This is fixable! Use this command to find hung processes:

ps -aux |grep [1ac6808f-0005-43bd-bb1c-d90f154f526d]

please substitute the UUID shown in bracketed orange above in the output of the grep command listed above. (This is also the UUID of your AlikeSR--one and the same.)

This command will return a list of processes that are utilizing your SR. You can kill these processes using

kill -9 [pid]

Obtain the process number from the second column returned from the ps command above.

After killing these processes, re-run the umount command. It will not succeed with no errors returned. Then you can return to XenCenter and uplug and forget the SR. Then return to your A3 to re-run your diagnostic job.


Note:

If you encounter any SR Inconsistent errors after this process, stopping and restarting the A3 services from the console (docker up/down) should resolve them.