Chatta subito con l'assistenza
Chat con il supporto

Foglight for Storage Management Shared 4.9 - User Guide

Getting Started Configuring Agents to Monitor Storage Devices
Brocade SAN Switches Cisco SAN Switches Dell Compellent Arrays Dell EqualLogic PS Series Array Groups EMC CLARiiON CX Series Arrays (CLI) EMC CLARiiON, VNX, or VMAX Storage Arrays (SMI-S) EMC Isilon EMC VPLEX Hitachi Data Systems AMS, USP, and VSP HP EVA Storage Arrays HP 3PAR Arrays NetApp Filers Configuration Procedures
Managing Agents Using Foglight for Storage Management Monitoring Storage Performance Investigating Storage Devices Investigating Storage Components Investigating VPLEX Storage Troubleshooting Storage Performance Managing Data Collection, Rules, and Alarms Understanding Metrics Appendix: Collection Target Support Matrix Online-Only Topics

Understanding Status, Alarms, and Rules in Foglight for Storage Management

In Foglight for Storage Management, all storage devices and their child components are assigned a status, which enables you to see at a glance which entities in your storage infrastructure need attention. Status and alarms are controlled by rules that can be based on data, time, schedules, or events. The rule conditions define which values reflect acceptable behavior (Normal status) and which values warrant an alarm (Warning, Critical, or Fatal status). As the value associated with a monitored entity passes a condition threshold, Foglight for Storage Management generates an alarm and changes the entity’s status accordingly.

NOTE: When a component is selected, its detail views display the component’s Status followed by its State. Status is determined by Foglight for Storage Management as describe above. State refers to the physical state of a component as reported by the vendor; if the vendor does not provide the physical state, the state is unknown. A component’s physical state may affect its status only when an enabled rule triggers alarms based on state. Consult with your Foglight Administrator if you want to enable or create rules that perform this check.

A storage device often has large numbers (thousands) of child components. With a few exceptions, alarms on child components do not change the status of the parent device. For example, a failed disk may have a Fatal status, but because arrays are designed to cope with a failed disk, the parent device continues to display a Normal status. The parent device status may be changed by child components in the following circumstances:

For information about changing default rules and alarm settings, see Managing Foglight for Storage Management Rules.

Reviewing the Status of All Devices

Use the Monitoring tab in the Storage Environment dashboard to gain a high-level understanding of the status of the devices in your environment, organized by device type. For a general description of the dashboard, see Introducing the Storage Environment Dashboard.

1
On the navigation panel, under Dashboards, click Storage & SAN > Storage Environment.
2
Click the Monitoring tab.

Assessing Storage Alarms

When a storage device or component enters an unacceptable state (as defined in a rule), the rule that monitors the entity triggers an alarm and sets the status of the resource. Examine the alarm messages starting with the most severe alarms.

This walkthrough assumes that you are looking at alarms of all severity levels in an Alarm Summary view. In many places in the software you can restrict your assessment to resources with the same alarm severity. This can be useful when you want to prioritize your alarm assessment, such as focusing on all storage arrays with Critical alarms first, and then on the Warning alarms.

Explore at Storage Alarm time. Select this option when you want to view diagrams and other details of the affected component at the time the alarm occurred. Shows data for the time period leading up to and including the alarm time. For example, given an alarm time of 10:32 AM and the default four hour time range, the diagnostic time range is set to 6:32 AM – 10:32 AM.
Explore at Default Diagnostic time. Select this option when you want to determine if the situation causing the alarm persisted or if it resolved on its own. Shows data before and after the alarm, with the alarm time positioned three quarters of the way into the time range. For example, given an alarm time of 10:32 AM and the default four hour time range, the diagnostic time range is set to three hours before the alarm and one hour after the alarm, that is, 7:32 AM – 11:32 AM. If the current time falls within the range, for example, it is currently 11:05 AM, the time range is set to 7:32 AM – 11:05 AM.
6
When you complete your investigation, in the breadcrumbs, click Storage Environment to return to the Choose Diagnostic Focus Time window. If desired, choose the other diagnostic time range.
Acknowledge. Continues to display the alarm, but it is marked as acknowledged until the alarm is triggered again. For example, for Warnings, an appropriate action may be to acknowledge the alarm and ignore it.
Acknowledge Until Normal. Continues to display the alarm, but it is marked as acknowledged until the affected component returns to the Normal status. This is useful when a component has failed and you want to know when it is replaced.
Clear. Deletes the alarm. Choose this option when the situation is resolved.
TIP: When you close the window, the time range returns to the time range in use before your alarm analysis. If it does not, in the Time Range either click the Frozen Time Range icon to return to real time or click the arrow to expand the zonar and set the range. For more information, see “Working in a Current or a Diagnostic Time Range” in the online help.

Monitoring Fabrics

Foglight for Storage Management provides insight into both physical and virtual fabrics available with Brocade and Cisco Fibre Channel (FC) switches. A physical fabric is a group of interconnected FC switches. The definition of a virtual fabric differs depending on the vendor:

Fabrics are displayed in the Fabrics quick view of the Storage Environment dashboard. When you expand a fabric branch to view its components, the list of components varies depending on the type of fabric as follows:

This walkthrough introduces the quick views for fabrics and their components.

2
Click the Fabrics tile to open the Fabrics quick view.
The Fabrics Summary (All Fabrics) panel opens. The Fabrics view identifies the top three fabrics with the highest average values for Data Rate, Link Error Rate, and Non-Link Error Rate, respectively. The FC Switches view identifies the top three switches in terms of the same metrics. The charts plot the metric values over the time period, while the tables show the average and current values for each component.
In the Fabric Summary (Selected Fabric), the Related Inventory view contains alarm summaries for the selected fabric as well as its switches, ISL ports, N ports, and VSANs (Cisco fabrics only). The Resource Utilization charts display the following metrics for ISL ports (left) and N ports (right) used by the fabric:
Avg Utilization Distribution. For each type of port, displays aggregated values for Rcvd Utilization and Xmit Utilization grouped by percentage of usage. Most of your port utilization should be in the lower percentages. When there are ports performing at high utilization rates, you may want to investigate port performance further.
Data Rate. For each type of port, plots aggregated values for Data Receive Rate and Data Send Rate over the time period and displays the Baseline.
Error Rate. For each type of port, plots aggregated values for Link Error Rate and Non-Link Error Rate over the time period.
Cisco fabrics only —To investigate a VSAN used in the selected fabric, in the Related Inventory view click VSANs or an alarm icon, and select a VSAN. See Exploring a Cisco VSAN.
Ports Average Utilization Distribution. For each type of port, displays aggregated values for Rcvd Utilization and Xmit Utilization grouped by percentage of usage. Most of your port utilization should be in the lower percentages. When there are ports on the switch performing at high utilization rates, you may want to investigate port performance further.
Rcv Rate. For each type of port, plots aggregated values for Data Receive Rate over the time period and displays the Baseline.
Xmit Rate. For each type of port, plots aggregated values for Data Send Rate over the time period and displays the Baseline.
Error Rate. For each type of port, plots aggregated values for Link Error Rate and Non-Link Error Rate over the time period.
Related Documents

The document was helpful.

Seleziona valutazione

I easily found the information I needed.

Seleziona valutazione