Troubleshooting Storage Issues


Many common storage issues aren't shown directly in the Alike WebUI, but can still lead to job errors and failures. You can check for storage issues by looking in the Alike System Log, under Tools->System Logs->(Logfile: dropdown list)->System Log. But this log rolls fairly aggressively, so the best way to check for all issues since last boot is dmesg.


You can look at dmesg from the A2 console by selecting option 4 to drop to shell. Then type

dmesg -T

to show all messages. If you want to scroll, you can type

dmesg -T | less


Look for errors related to NFS or CIFS. These could indicate problems with your storage connection. Before opening a support case, it's always best to check for these, as support will likely first ask you to correct any storage-level issues before anything else.


While it is rare, there are types of issues, such as storage permission issues, that won't show up in the dmesg log. These classes of errors, while more rare, can cause jobs to fail. You can check for permissions issues by looking at the engine.log available under Tools->System Logs(Logfile: dropdown list)->Data Engine (Java). You can also access this log from the shell. It's under /var/log/engine.log.


If this log shows many errors related to writing blocks, check the details of these errors. They may contain root causes about permission issues. You can test permissions by navigating to /mnt/ads and attempting to touch a test file. If this test fails, it's likely the permissions of your ADS are incorrect, and you should log into your NAS or SAN to correct it.


For example, the following error indicates a storage misconfiguration:


[Apr 29 03:26:06] blockvaulter[1340] Error processing block write#012quadric.blockvaulter.CloudException: java.io.FileNotFoundException: /mnt/ads/blocks/b5c/b5cfa9d6c8febd618f91ac2843d50a1c (Stale file handle)#012#011at quadric.spdb.FsAdapter.putBlock(FsAdapter.java:137)#012#011at quadric.blockvaulter.BandwidthCloudAdapter.putBlock(BandwidthCloudAdapter.java:41)#012#011at quadric.util.VaultUtil.putBlockFromMem(VaultUtil.java:392)#012#011at quadric.fuse.AmbHelper.doWriteBlock(AmbHelper.java:52)#012#011at quadric.socket.MungeServer.doSendBlock(MungeServer.java:466)...


The section highlighted in blue is the root cause, which is the file cannot be found because of a stale file handle issue.

Similar root causes could include permissions issues, such as permission denied.

Root causes such as these must be resolved on the storage-end, and cannot be fixed within the A2 itself.