A2 Emergency Procedures
A2 Emergency Restore Support Procedures
Most customers use the A2 for routine backup and restore use cases without incident. In rare cases, additional restore interventions are needed, especially when environments are compromised.
This document contains the procedures Quadric Software after-hours engineers are trained to respond with. You are empowered to follow these interventions to self-service an incident, or to prepare for one.
1) If your original A2 installation is lost, download and configure a new A2 and reattach its ADS and/or ODS. Follow the Quick Start Guide here: http://docs.quadricsoftware.com/kb/guides/a2-quick-start-guide/
2) Place the A2 in read-only mode:
From Console Menu, select Advanced (6) -> A2 System Options (4) -> Configure Read-Only Mode (8)
3) If you need to recover individual files or disks, see RESTFS below.
4) If you need to restore a VM or VM disk, see VMREST below
In an emergency, it's critical to seek workarounds and try multiple avenues when the first approach does not work. Quadric's procedure is always to fall back to the simplest and most direct restore path, which to recover VHD/VHDX files from the RestoreFS. See RESTFS below.
Boot and A2 System
Problem (SERVBOOT): Not all A2 services are running on the A2 console after a reboot.
Solution: Please follow these steps:
1) Select option 4 from the A2 console to drop to shell.
2) type the following:
Problem (SERVDS): Not all A2 services are running on the A2 console, and you have tried SERVBOOT already
Solution: Check to verify that your ADS and ODS are mounted correctly. Remount if needed. If you do not need your ODS, detach it. All storage configuration is done under console option 2.
Problem (VERS):You have problems restoring data and the A2 tells you a newer version is available. Should you upgrade?
Solution: No. Never perform an upgrade during an emergency restore event unless specifically instructed to by Quadric Support during business hours.
Problem (RESTFS): You need to restore files or whole disks individually
Solution: Use the RestoreFS described here. http://docs.quadricsoftware.com/admin/6/restore.html. See also FLR and and FSDISK.
Problem (REBUILD): All the services appear to be running but when you cd into the restoreFS as described in PRF, it hangs
Solution: It is possible that after a reboot event or data storage reattach, the ADS or ODS is rebuilding its cache state. In typical environments this takes 5-10 minutes at the most. But on very large environments with tens of thousands of VM versions protected or more, it can take longer. In rare cases, it will take an hour or longer after a reboot event. You can see if the A2 is performing a rebuild, and follow its progress on the web UI dashboard.
Disk and File Recovery
Problem (FLR): You need to restore individual files
Solution: The A2 will automatically mount your disk images and expose them as directories structure as you browse the restoreFS. This is described in detail here: http://docs.quadricsoftware.com/admin/6/restore.html
Problem (FSDISK): You need to restore one or more disks
Solution: Choose either VHD, IMG, or VHDX disks to copy out of the restoreFS. See RESTFS. Use VHD if you wish to import the result into Xen. Use VHDX if you wish to import the result into Hyper-V. Use IMG if you need to raw disk image. IMG is slightly faster because its format has less overhead.
Problem (UNCBAD): You have tried accessing the UNC path of the restoreFS at \\your.a2.ip\Restore but cannot log in
Solution: Try using an SCP client instead (eg. WinSCP for Windows). Once you have connected using the “alike” user with the password you have set for it (defaults to “alike”), you can find the restoreFS under /mnt/restore.
Problem (PRF): You have performance problems or other issues with both SCP and UNC
Solution: Try restoring directly to your ADS, rather than SCP or CIFS:
- Use console option 4 to drop to the shell.
- cd to /mnt/restore/0 for ADS FLR, /mnt/restore/1 for ODS FLR
- Use the Linux “cp” command to copy files from the restoreFS to your ADS location, which is /mnt/ads. You can make a new temp directory under /mnt/ads and copy files/disks there instead.
- If performance is also unsatisfactory over this mechanism, see also PRFRAM and FRFHARD.
Problem (FSMNT): FLR partitions do not appear under the “0” “1” subdirs described in the FLR documentation (see FLR).
Solution: It can take time for these partition directories to be exposed, especially for larger disks. CD in and out of this directory for a minute or two. Then you will see “p0”, “p1”, etc.
Problem (FSMNT2): You have tried FSMNT and still no partitions show up. The FLR directory remains empty.
Solution: If you have tried FSMNT and this didn’t work, your filesystem likely cannot be mounted. This could be for a variety of reasons. The most common is volumes formatted using LVM, which is not a supported filesystem for FLR. For LVM, and any other unsupported filesystem, or any other FLR problems, the next step is to conduct a full restore or disk restore instead. See VMREST or RESTFS for instructions.
Problem (PRFRAM): You need other ways to improve performance
Solution: Increasing memory given to the A2 will allocate larger buffers for restore operations. This may accelerate some operations. See also PRFHARD.
Problem (PRFHARD): You need to improve performance and PRFRAM didn’t work or didn’t work enough
Solution: In an emergency situation, there are no other interventions Quadric Software can provide to accelerate your performance. Investigate your network and storage. Quadric Software can assist with isolating bottlenecks during business hours for non-restore emergency tickets only.
VM Full Restore
Problem (VMREST): You need to restore an entire VM
Solution: Use the Alike WebUI to conduct a full restore, described here: http://docs.quadricsoftware.com/admin/6/restore.html
Problem (VMERRABD): The full restore job fails with an ABD/networking error.
Solution: Run an ABD diagnostic job on the target pool to make sure an ABD is deployed and configured correctly: http://docs.quadricsoftware.com/kb/troubleshooting/abd-troubleshooting-guide/
Problem (VMERROTH): The full restore job fails and the error cannot immediately be corrected by ABD troubleshooting
Solution: Do not re-run a full restore job. Instead, copy the VHD file(s) from the RestoreFS and use XenCenter to import them. See RESTFS for step to copy VHD files from your A2's RestoreFS.
Problem (VMPRF): The performance of your full VM restore does not meet your expectations or will take too long to complete for your business objective
Solution: Use another restore approach, such as FLR. See also PRF, PRFRAM, and PRFHARD.
Problem (VMBOOT): The restored VM has boot issues
Solution: Quadric Software protects the state of the VM at time of backup, but cannot guarantee VM bootability in all circumstances. Problems within the VM at the time of the backup (eg. corruption, virus, crypto-locker, etc) can effect a restored VM's bootability. If this is the case, please attempt to restore from an earlier version. See also CORRF.
Problem (CORRF): The resulting VM has damaged or missing files
Solution: Damage to your A2's Data Store can in theory cause restore jobs that complete without error but produce invalid VMs. You can check the validity of the backup version (DATAVAL). If the backup validates, your restore problem is likely not an Alike Software issue. Alike faithfully backs up the state of your disks at time of backup. Applications that are not equipped to handle crash-consistent backups (such as Exchange) should use a quiesced or vss-enabled backup approach. Some applications may report “corrupt” databases even against valid backups, and will require a proprietary procedure to repair or replay transactions or database journals.
Problem (DATAVAL): How can you determine if your backup data is valid?
Solution: Alike stores your data in individual data blocks on your ADS. Each block is identified by its unique cryptographic signature. You can validate all block signatures for a backup from the A2 WebUI using the “validate” option. If validation fails, this likely indicates damage or missing data on disk.
Problem (NONPROD): You have a problem with anything besides the restore of a production VM from a backup taken by Alike. This includes but is not limited, to problems with Alike replicas, errors in backup jobs, vault issues, and licensing, email notification issues, job warnings, etc.
Solution: Quadric Software only provides after-hours emergency support for restore of production machines from existing backups. All other procedures should be tested periodically by the customer. Quadric Software offers comprehensive support of these issues during its business hours.