Foglight for Storage Management Shared 4.6


	TIP: If you see alarms on devices or components that you think are operating within acceptable parameters, consider creating new rules to better suit your environment. For more information, see Managing Foglight for Storage Management Rules.

If you need more details to understand the issue, click the alarm message.

An Alarm window displays more information about the alarm and troubleshooting tips.


	TIP: To bypass the Alarm window and go straight to the Choose Diagnostic Focus Time window (described in the next step), click an instance name instead of the message.

From the Troubleshooting pane, click the Diagnose button.

Choose the time period to use as your diagnostic time range:

•

Explore at Storage Alarm time. Select this option when you want to view diagrams and other details of the affected component at the time the alarm occurred. Shows data for the time period leading up to and including the alarm time. For example, given an alarm time of 10:32 AM and the default four hour time range, the diagnostic time range is set to 6:32 AM – 10:32 AM.

•

Explore at Default Diagnostic time. Select this option when you want to determine if the situation causing the alarm persisted or if it resolved on its own. Shows data before and after the alarm, with the alarm time positioned three quarters of the way into the time range. For example, given an alarm time of 10:32 AM and the default four hour time range, the diagnostic time range is set to three hours before the alarm and one hour after the alarm, that is, 7:32 AM – 11:32 AM. If the current time falls within the range, for example, it is currently 11:05 AM, the time range is set to 7:32 AM – 11:05 AM.

A component dashboard opens with its time range set to the selected diagnostic time range.

Review the component dashboard to better understand the data that led to the alarm. If you navigate to other dashboards, the diagnostic time range remains the same.

When you complete your investigation, in the breadcrumbs, click Storage Environment to return to the Choose Diagnostic Focus Time window. If desired, choose the other diagnostic time range.

When you are finished, close the Choose Diagnostic Focus Time window, and in the Alarm window click one of the following options:

•

Acknowledge. Continues to display the alarm, but it is marked as acknowledged until the alarm is triggered again. For example, for Warnings, an appropriate action may be to acknowledge the alarm and ignore it.

•

Acknowledge Until Normal. Continues to display the alarm, but it is marked as acknowledged until the affected component returns to the Normal status. This is useful when a component has failed and you want to know when it is replaced.

•

Clear. Deletes the alarm. Choose this option when the situation is resolved.

Close the Alarm window.


	TIP: When you close the window, the time range returns to the time range in use before your alarm analysis. If it does not, in the Time Range either click the Frozen Time Range icon to return to real time or click the arrow to expand the zonar and set the range. For more information, see “Working in a Current or a Diagnostic Time Range” in the online help.

Take action to resolve the issue in your storage infrastructure, either by yourself or by notifying the appropriate person.

Monitoring Fabrics

Foglight for Storage Management provides insight into both physical and virtual fabrics available with Brocade and Cisco Fibre Channel (FC) switches. A physical fabric is a group of interconnected FC switches. The definition of a virtual fabric differs depending on the vendor:

•

Brocade switches enable customers to group ports on physical switches into logical switches. Logical switches and physical switches can then be interconnected into virtual fabrics. Brocade creates logical ISL ports to interconnect logical switches. No metrics are available for LISL ports.

•

Cisco switches enable customers to create virtual storage area networks (VSANs) partitioned from a physical fabric. A VSAN is a logical group of ports, where the ports are located on one or more of the interconnected FC switches that form the physical fabric.

Fabrics are displayed in the Fabrics quick view of the Storage Environment dashboard. When you expand a fabric branch to view its components, the list of components varies depending on the type of fabric as follows:

This walkthrough introduces the quick views for fabrics and their components.

To monitor fabrics, switches, and VSANs:

On the Storage Environment dashboard, ensure the Monitoring tab is selected.

Click the Fabrics tile to open the Fabrics quick view.


	TIP: You can also open this quick view from the navigation panel. For more information, see Introducing the Storage Explorer.

To identify the busiest fabrics in your environment, in the Fabrics list, click Summary.

The Fabrics Summary (All Fabrics) panel opens. The Fabrics view identifies the top three fabrics with the highest average values for Data Rate, Link Error Rate, and Non-Link Error Rate, respectively. The FC Switches view identifies the top three switches in terms of the same metrics. The charts plot the metric values over the time period, while the tables show the average and current values for each component.

To investigate one of the top three fabrics or switches:

•

To explore a top fabric, click its name in a table. See Exploring a Fabric.

•

To explore a top switch, click its name in a table. See Exploring a Switch.

•

To return to this quick view, in the breadcrumbs, click Storage Environment.

To monitor the performance of a fabric, in the Fabrics list, click a fabric name.

In the Fabric Summary (Selected Fabric), the Related Inventory view contains alarm summaries for the selected fabric as well as its switches, ISL ports, N ports, and VSANs (Cisco fabrics only). The Resource Utilization charts display the following metrics for ISL ports (left) and N ports (right) used by the fabric:

•

Avg Utilization Distribution. For each type of port, displays aggregated values for Rcvd Utilization and Xmit Utilization grouped by percentage of usage. Most of your port utilization should be in the lower percentages. When there are ports performing at high utilization rates, you may want to investigate port performance further.

•

Data Rate. For each type of port, plots aggregated values for Data Receive Rate and Data Send Rate over the time period and displays the Baseline.

•

Error Rate. For each type of port, plots aggregated values for Link Error Rate and Non-Link Error Rate over the time period.

To continue investigating the selected fabric:

•

To explore details about the fabric, its switches, and its ports, click View in Explorer. See Exploring a Fabric.

•

To explore an FC switch in the selected fabric, in the Related Inventory view click FC Switches or an alarm icon, and select a switch. See Exploring a Switch.

•

To investigate a port used in the selected fabric, in the Related Inventory view click either ISL Ports or N Ports or an alarm icon, and select a port. See Investigating an FC Switch Port.

•

Cisco fabrics only —To investigate a VSAN used in the selected fabric, in the Related Inventory view click VSANs or an alarm icon, and select a VSAN. See Exploring a Cisco VSAN.

•

To return to the quick view, in the breadcrumbs, click Storage Environment.

To monitor the performance of a switch (physical or logical), in the Fabrics list, expand a fabric and click a switch.

In the FC Switch Summary, the Related Inventory view contains alarm summaries for the selected switch, the fabric it belongs to, and its ISL ports and N ports. The charts display the following metrics for ISL ports and N ports used by the switch:

•

Ports Average Utilization Distribution. For each type of port, displays aggregated values for Rcvd Utilization and Xmit Utilization grouped by percentage of usage. Most of your port utilization should be in the lower percentages. When there are ports on the switch performing at high utilization rates, you may want to investigate port performance further.

•

Rcv Rate. For each type of port, plots aggregated values for Data Receive Rate over the time period and displays the Baseline.

•

Xmit Rate. For each type of port, plots aggregated values for Data Send Rate over the time period and displays the Baseline.

•

Error Rate. For each type of port, plots aggregated values for Link Error Rate and Non-Link Error Rate over the time period.


	TIP: To identify port performance that has changed, look for values that fall outside the grey Baseline ribbon.

To continue investigating the selected switch:

•

To explore details about the switch and its ports, click View in Explorer. See Exploring a Switch.

•

To investigate a port used by the selected switch, in the Related Inventory view click ISL Ports or N Ports or an alarm icon, and then select a port. See Investigating an FC Switch Port.

•

To return to the quick view, in the breadcrumbs, click Storage Environment.

To monitor the performance of a Cisco VSAN, in the Fabrics list, expand a Cisco fabric and click a VSAN.

In the VSAN Summary, the Related Inventory view contains alarm summaries for the selected VSAN, the fabric it belongs to, and the ISL ports and N ports used by the VSAN. The Resource Utilization charts are the same as the charts displayed for a fabric, but the aggregated values include only the ports used by the VSAN.

To continue investigating the selected VSAN:

•

To explore details about the VSAN and its ports, click View in Explorer. See Exploring a Cisco VSAN.

•

To investigate a port used by the selected VSAN, in the Related Inventory view click ISL Ports or N Ports or an alarm icon, and then select a port. See Investigating an FC Switch Port.

•

To return to the quick view, in the breadcrumbs, click Storage Environment.

Monitoring Storage Arrays

When monitoring storage arrays, or when diagnosing alarms on storage arrays, you may need more detail on the array or its member nodes, controllers, ports, pools, LUNs, or disks. You may also want to determine if any of the child components have alarms that did not affect the array status. This walkthrough introduces the quick view for storage arrays.

To monitor storage arrays:

On the Storage Environment dashboard, ensure the Monitoring tab is selected.

Click the Storage Arrays tile to open the Storage Arrays quick view.


	TIP: You can also open this quick view from the navigation panel. For more information, see Introducing the Storage Explorer.

To identify the most problematic arrays in your environment, in the Storage Arrays list, click Summary.

The Storage Array Summary summarizes the capacity and performance health of the arrays in the environment.

•

Capacity

•

The table is organized by arrays whose pools have the most significant, near-term capacity issues.

•

The categories take into account the estimated time when pool capacity will be full, as well as the current available capacity and the over-provisioning state of the pool.

•

To investigate, click on the array name. See Monitoring Storage Capacity.

•

Performance

•

The table is organized by arrays with the most LUNs having latency issues.

•

To investigate, click on the array name in the table to drill down to the Array Explorer. Click on the LUNs tab. See Investigating a LUN.

To monitor a storage array, in the Storage Array list, click a storage array name.

The content of the quick view varies depending on the selected storage array. For most storage arrays (excluding Dell EqualLogic and EMC Isilon), the quick view contains the following embedded views:

•

Related Inventory. Contains alarm summaries for the selected storage array and its controllers, FC ports, IP ports, pools, LUNs, and disks.

•

Storage Capacity Summary. Displays current values for Total Advertised LUNs Size and Capacity Provisioned to LUNs.

•

Controller Performance. Plots % Busy values by controller over the time period and displays threshold lines (defined in registry variable StSAN.Controller.PctBusyThreshold).

•

Pools with Severe or High Pressure on Available Usable Capacity (or Raw Capacity)

•

Displays the pools that have the most significant, near-term capacity issues.

•

Shows the available usable capacity or available raw capacity in the table, depending on the data provided from the device vendor, and the % available.

•

Shows the estimated time when the pool capacity will be full

•

Over commitment is not shown when raw capacity numbers are displayed.

•

The cylinders are colored to show the % of available capacity in the pool.

•

LUNs/Disks States. Plots the percentage of disks and LUNs in the storage array in problem states. Problem states are reported by the vendor. Resolving these issues may improve LUN performance.

For more storage array quick views, see the Summary tab description under Dell EqualLogic Storage Array and EMC VPLEX Storage Array.

To continue investigating the selected storage array:

•

To explore details about the storage array and its child components, click View in Explorer. See Exploring a Storage Array.

•

To investigate a child component, in the Related Inventory view click a component type or a status icon, and then select a component. A component dashboard opens. For help with the dashboard, see one of the following topics:

- Investigating a Controller

- Investigating an EqualLogic Member

- Investigating an Isilon Node

- Investigating an Array/Filer Port

- Investigating a Pool

- Investigating a LUN

- Investigating a Physical Disk

•

To return to this quick view, in the breadcrumbs, click Storage Environment.

Monitoring Filers

When monitoring filers, or when diagnosing alarms on filers, you may need more detail on a filer and its controllers, ports, NASVolumes, LUNs, aggregates, or disks. You may also want to determine if any child components display alarms. This walkthrough introduces the quick views for filers.

To monitor filers:

On the Storage Environment dashboard, ensure the Monitoring tab is selected.

Click the Filers tile to open the Filers quick view.


	TIP: You can also open this quick view from the navigation panel. For more information, see Introducing the Storage Explorer.

To identify the busiest filers in your environment, in the Filers list, click Summary.

The Filer Summary view identifies the top three filers in two categories:

•

Filers with Lowest % of Free Disk/Spares Capacity (raw). Displays cylinders showing the amount of used Free Disk/Spares Capacity (Raw). Below each cylinder, you can see total and free capacity.

•

Filers with Lowest % of Available Aggr Capacity (usable). Displays cylinders showing the amount of used Aggr Capacity (Usable) Free. Below each cylinder, you can see total and free capacity.

If you find that cylinder colors do not reflect the acceptable thresholds in your environment, you can ask your Foglight for Storage Management Administrator to edit the threshold values in the following registry variables:

•

Disk capacity: StSAN.FilerDisks.PctUnallocatedCapacityThreshold.[Fatal|Critical|Warning]

•

Aggregate capacity: StSAN.FilerAggregates.PctUnallocatedCapacityThreshold.[Fatal|Critical|Warning]


	NOTE: Registry variables are global variables that are often referenced by rules. Before editing a registry variable, ensure that the edit will not cause unintended changes to how the affected rules trigger alarms. For more information, search for “Registry Variable” in the online help.

In the Filer list, click a filer name.

•

Storage Capacity Summary. The cylinders show the amount of capacity consumed in the filer, expressed using current values for the following pairs of metrics:

- Disk Capacity (Raw) and Free Disk/Spares Capacity (Raw)
- Aggr Capacity (Raw) Total and Aggr Capacity (Raw) Free
- Aggr Capacity (Usable) Total and Aggr Capacity (Usable) Free

The bar chart displays current values for Advertised LUN Size, Advertised NASVolumes Size, Aggr Capacity (Usable) Total, and Aggr Capacity (Usable) Free.

•

NASVolume Performance. Plots the percentage of NASVolumes in the filer in problem states. Problem states are reported by the vendor. Resolving issues may improve volume performance.

•

LUN Performance. Plots the percentage of LUNs in the filer in problem states. Problem states are reported by the vendor. Resolving issues may improve LUN performance.