Chat now with support
Chat with Support

Foglight for Storage Management Shared 4.5.5 - User and Reference Guide

Getting Started Monitoring Storage Performance Investigating Storage Devices Investigating Storage Components Troubleshooting Storage Performance Managing Data Collection, Rules, and Alarms Understanding Metrics Online-Only Topics

Assessing Storage Alarms

When a storage device or component enters an unacceptable state (as defined in a rule), the rule that monitors the entity triggers an alarm and sets the status of the resource. Examine the alarm messages starting with the most severe alarms.

This walkthrough assumes that you are looking at alarms of all severity levels in an Alarm Summary view. In many places in the software you can restrict your assessment to resources with the same alarm severity. This can be useful when you want to prioritize your alarm assessment, such as focusing on all storage arrays with Critical alarms first, and then on the Warning alarms.

Explore at Storage Alarm time. Select this option when you want to view diagrams and other details of the affected component at the time the alarm occurred. Shows data for the time period leading up to and including the alarm time. For example, given an alarm time of 10:32 AM and the default four hour time range, the diagnostic time range is set to 6:32 AM – 10:32 AM.
Explore at Default Diagnostic time. Select this option when you want to determine if the situation causing the alarm persisted or if it resolved on its own. Shows data before and after the alarm, with the alarm time positioned three quarters of the way into the time range. For example, given an alarm time of 10:32 AM and the default four hour time range, the diagnostic time range is set to three hours before the alarm and one hour after the alarm, that is, 7:32 AM – 11:32 AM. If the current time falls within the range, for example, it is currently 11:05 AM, the time range is set to 7:32 AM – 11:05 AM.
6
When you complete your investigation, in the breadcrumbs, click Storage Environment to return to the Choose Diagnostic Focus Time window. If desired, choose the other diagnostic time range.
Acknowledge. Continues to display the alarm, but it is marked as acknowledged until the alarm is triggered again. For example, for Warnings, an appropriate action may be to acknowledge the alarm and ignore it.
Acknowledge Until Normal. Continues to display the alarm, but it is marked as acknowledged until the affected component returns to the Normal status. This is useful when a component has failed and you want to know when it is replaced.
Clear. Deletes the alarm. Choose this option when the situation is resolved.
TIP: When you close the window, the time range returns to the time range in use before your alarm analysis. If it does not, in the Time Range either click the Frozen Time Range icon to return to real time or click the arrow to expand the zonar and set the range. For more information, see “Working in a Current or a Diagnostic Time Range” in the online help.

Monitoring Fabrics

Foglight for Storage Management provides insight into both physical and virtual fabrics available with Brocade and Cisco Fibre Channel (FC) switches. A physical fabric is a group of interconnected FC switches. The definition of a virtual fabric differs depending on the vendor:

Fabrics are displayed in the Fabrics quick view of the Storage Environment dashboard. When you expand a fabric branch to view its components, the list of components varies depending on the type of fabric as follows:

This walkthrough introduces the quick views for fabrics and their components.

2
Click the Fabrics tile to open the Fabrics quick view.
The Fabrics Summary (All Fabrics) panel opens. The Fabrics view identifies the top three fabrics with the highest average values for Data Rate, Link Error Rate, and Non-Link Error Rate, respectively. The FC Switches view identifies the top three switches in terms of the same metrics. The charts plot the metric values over the time period, while the tables show the average and current values for each component.
In the Fabric Summary (Selected Fabric), the Related Inventory view contains alarm summaries for the selected fabric as well as its switches, ISL ports, N ports, and VSANs (Cisco fabrics only). The Resource Utilization charts display the following metrics for ISL ports (left) and N ports (right) used by the fabric:
Avg Utilization Distribution. For each type of port, displays aggregated values for Rcvd Utilization and Xmit Utilization grouped by percentage of usage. Most of your port utilization should be in the lower percentages. When there are ports performing at high utilization rates, you may want to investigate port performance further.
Data Rate. For each type of port, plots aggregated values for Data Receive Rate and Data Send Rate over the time period and displays the Baseline.
Error Rate. For each type of port, plots aggregated values for Link Error Rate and Non-Link Error Rate over the time period.
Cisco fabrics only —To investigate a VSAN used in the selected fabric, in the Related Inventory view click VSANs or an alarm icon, and select a VSAN. See Exploring a Cisco VSAN.
Ports Average Utilization Distribution. For each type of port, displays aggregated values for Rcvd Utilization and Xmit Utilization grouped by percentage of usage. Most of your port utilization should be in the lower percentages. When there are ports on the switch performing at high utilization rates, you may want to investigate port performance further.
Rcv Rate. For each type of port, plots aggregated values for Data Receive Rate over the time period and displays the Baseline.
Xmit Rate. For each type of port, plots aggregated values for Data Send Rate over the time period and displays the Baseline.
Error Rate. For each type of port, plots aggregated values for Link Error Rate and Non-Link Error Rate over the time period.

Monitoring Storage Arrays

When monitoring storage arrays, or when diagnosing alarms on storage arrays, you may need more detail on the array or its member nodes, controllers, ports, pools, LUNs, or disks. You may also want to determine if any of the child components have alarms that did not affect the array status. This walkthrough introduces the quick view for storage arrays.

2
Click the Storage Arrays tile to open the Storage Arrays quick view.
Related Inventory. Contains alarm summaries for the selected storage array and its controllers, FC ports, IP ports, pools, LUNs, and disks.
Controller Performance. Plots % Busy values by controller over the time period and displays threshold lines (defined in registry variable StSAN.Controller.PctBusyThreshold).
LUNs/Disks States. Plots the percentage of disks and LUNs in the storage array in problem states. Problem states are reported by the vendor. Resolving these issues may improve LUN performance.

Monitoring Filers

When monitoring filers, or when diagnosing alarms on filers, you may need more detail on a filer and its controllers, ports, NASVolumes, LUNs, aggregates, or disks. You may also want to determine if any child components display alarms. This walkthrough introduces the quick views for filers.

2
Click the Filers tile to open the Filers quick view.
Filers with Lowest % of Free Disk/Spares Capacity (raw). Displays cylinders showing the amount of used Free Disk/Spares Capacity (Raw). Below each cylinder, you can see total and free capacity.
Filers with Lowest % of Available Aggr Capacity (usable). Displays cylinders showing the amount of used Aggr Capacity (Usable) Free. Below each cylinder, you can see total and free capacity.
Disk capacity: StSAN.FilerDisks.PctUnallocatedCapacityThreshold.[Fatal|Critical|Warning]
Aggregate capacity: StSAN.FilerAggregates.PctUnallocatedCapacityThreshold.[Fatal|Critical|Warning]
Storage Capacity Summary. The cylinders show the amount of capacity consumed in the filer, expressed using current values for the following pairs of metrics:
NASVolume Performance. Plots the percentage of NASVolumes in the filer in problem states. Problem states are reported by the vendor. Resolving issues may improve volume performance.
LUN Performance. Plots the percentage of LUNs in the filer in problem states. Problem states are reported by the vendor. Resolving issues may improve LUN performance.
Related Documents

The document was helpful.

Select Rating

I easily found the information I needed.

Select Rating