Understanding Restore Performance


Starting in Alike A3 7.2 (builds >=8289), Alike increases performance for all restore paths. To get the most out of these improvements, it's great to understand how A3 restores work.


All restore paths load your data from block files housed in your Data Store (ADS or ODS). This is done by the Alike restoreFS, which exposes a filesystem under /mnt/restore of your A3 docker container. Disk data shown on the restoreFS appears to be real--your VHD/X files are listed there with dates, sizes, and the like. But they are not real files or directories! The files and directories shown here are generated on-demand by the restoreFS and actually consume zero bytes.

How? As you browse and read these files, the restoreFS loads your deduplicated block files located on your Data Store to create everything on the fly. That means no matter what kind file or disk you appear to be restoring, it's always backed by block files on your Data Store. It's important to understand this, because there are no actual VHD files saved on your Data Store. Whenever you go to read a VHD file, the restoreFS grabs the necessary data blocks and interprets them in realtime. From the perspective of your storage device, it's just a series of random-access file reads. So even if your storage performs well at certain types of benchmarks, the devil is in the details. Many popular benchmark protocols poorly predict your restore performance, as random access read benchmarks best predict Alike restore performance.

Now let's look at a few specific restore paths.

Image File Restore: VHD, VHDX, and raw disk (IMG) files are available from the restoreFS. You can access these synthetic disks via SCP or CIFS. As discussed, these disks are generated on-demand as you access them, and do not actually consume any storage. Instead, the random read speed of your Data Store predicts how quickly you can restore these files.

File Level Restore: File-level restore is one of the fastest ways to quickly recover individual files. The A3 provides file-level restore by mounting a synthesized disk image from the restoreFS. Mounting occurs automatically whenever needed, making for seamless access to your files via the "File Browser" of the Alike WebUI, SCP, and CIFS. Again, your Data Store does not actually contain the files within your protected VMs. These files are generated dynamically when you access the restoreFS.

Full Restore: Full restores can be scheduled and launched from the Alike UI. All full restore paths read disk images created by the restoreFS. So again, disk images aren't stored on your ADS, but instead, are generated from your Data Store when you run a restore job. And the faster your underlying restoreFS performance, the faster your full restore can theoretically run.

However, full restores also have to write your disk data over your network to your destination. If you are experiencing performance problems with full restore, you need to determine if the bottleneck is the restoreFS or the destination network and disk. You can determine which by testing the restoreFS in isolation. If it is very fast to copy images off the restoreFS to /dev/null, that indicates network or disk write speed is the problem. If such a test performs slowly, that means your Data Store is the bottleneck.

Instant Restore: Instant restore, or instaboot, leverages on-demand disk images managed by the restoreFS to expose a special SR to Xen. This SR allows you to instantly boot protected VMs. Because no data needs to be copied to another SR, restores complete in seconds. Xen will read and write to this SR as you boot and use the VM. Because of the overhead of the restoreFS compared to physical disk, these I/O operations will be slower than your production SR. Use instaboot as a "spare tire" to get yourself out of sticky situations or conduct fire drills. Always migrate off the Alike SR to your production storage as soon as possible.


Summary: As discussed, all restore pathways go through the same pipeline to obtain your backup data. This pipeline is the restoreFS, which loads disk data from block data on your Data Store. All restore performance is based on random access to your Data Store blocks.

Tips: You can look at your data access patterns by monitoring the Alike WebUI Dashboard. The "Engine Details" graph shows "Engine I/O" in terms of data being read off the engine (blue up arrow), or data being written to the engine (green down arrow). By following the blue graph line, you can see the rate at which block data is being loaded and sent out. Check with your storage vendor to determine the best protocol for accessing your Data Store and tune it for 2MB random file access. If you believe your storage device will perform better with larger blocks, you can re-install your A3 with 4MB or 8MB blocks instead.