Chat now with support

Get Live Help
Complete Registration

Sign In

Request Pricing

Contact Sales

Foglight 5.9.1 - Performance Tuning Field Guide

Table of Contents

Characteristics of Poor Performance in Foglight Parameters that Affect Foglight Performance Critical Areas that Can Affect Foglight Performance Identifying Performance Problems

Hardware and Operating System Tuning

Running on Virtual Hardware

Performance Challenges CPU

Ready Time

Diagnostic Snapshot

Management Server Tuning

Dashboard Default Timeout

Java Virtual Machine Tuning

Getting Started Foglight JVM Configuration

-XX:+UseConcMarkSweepGC -Xmx and -Xms -XX:NewSize, -XX:MaxNewSize, and -XX:NewRatio -XX:MaxPermSize -Xss

Common Symptoms and Tuning Resolutions

Backend Database Tuning

Initial Database Configuration Settings

Monitoring and Managing Database Size Backup and Recovery Recommendations init.ora Configuration

Sample init.ora:

Database Maintenance Recommendations

Case

How the Management Server Uses the Database Index Management Memory Block Size Oracle Striping Oracle Tablespaces

Undo tablespace Custom Tablespaces

Microsoft SQL Server Tuning Troubleshooting

Management Server Load Preventing Database Connections

High Availability (HA) Tuning

Tuning Connection Issues in an HA Implementation Managing Hosts with Multiple Network Interfaces JDK with IPv6 on Linux Management Server Automatically Restarted Other JGroup Related Issues and Information

Topology Changes and Topology Churn Canonical Data Transformations (CDTs) Agent Weight / Environment Complexity Large Topologies Sampling Frequency XML-HTTP Agent Adapter Dropped Agent Manager Log Messages Java EE Technologies

Appendix: Performance Health Check

Is the Server Getting the Right Amount of Memory?

Management Server Memory is Healthy if ... Management Server Memory Checks Possible Actions

Is the Server Getting Enough CPU?

Management Server CPU is Healthy if ... Management Server CPU Checks Extended Browser Interface Checks Possible Actions

Is the Database too Slow?

The Database is Healthy if ... Database Checks Possible Actions

Is the Database Growth Reasonable?

Database Growth is Healthy if ... Database Checks Possible Actions

Is There Too Much Data to Process?

Data Volume is Healthy if ... Data Volume Checks Possible Actions

Is the Model Stable?

The Model is Healthy if ... Model Checks Possible Actions

Are Too Many Alarms Firing?

The Number of Alarms is in a Healthy State if... Alarms Checks Possible Actions

Is the Business Logic Properly Tuned?

Business Logic is Healthy if... Business Logic Checks Possible Actions

Are There Too Many User Requests?

User Activity Is Healthy If... Possible Actions

Appendix: Analyzing a Support Bundle

Topology Sync Topology Limits

Diagnostic Snapshot

Memory Consumption Threads, Deadlocks, and Overall CPU Usage Useful Information ---- jboss.system:type=ServerInfo ---- jboss.jca:service=ManagedConnectionPool,name=jdbc/nitrogen ---- jboss.web:type=RequestProcessor,* ---- com.quest.nitro:service=Derivation

Derivation Related Issues

---- com.quest.nitro:service=Topology

Topology Related Issues

---- com.quest.nitro:service=DataCacheEviction

Analyzing a Performance Report

Server Rule Information System-wide Topology Changes JVM Memory Usage Management Server Garbage Collectors JDBC Connection Pool Derivation Rulette

Derivation Related Issues

This service is most often the cause of diagnostic snapshots that are 500 to 600 MB in size. Large file sizes indicate you may have an issue with an excessive number of derivations.

Identifying and Resolving Derivation Related Issues

Search the file for: with .* derivation rulettes. The results should resemble the following:

DATA_DRIVEN with 115 derivation rulettes

DATA_DRIVEN with 115 derivation rulettes

DATA_DRIVEN with 187230 derivation rulettes

DATA_DRIVEN with 97 derivation rulettes

DATA_DRIVEN with 115 derivation rulettes

The third line (187230 derivation rulettes) indicates an issue because the number of derivation rulettes is significantly larger than any other rulettes listed. Locate the complex derivation definition (located above the with .* derivation rulettes section in the file), and make a note of the topology type and metric name. For example:

Complex derivation definition: DBSS_Total_Elapsed_Time_Per_Exec (null) : DerivationCalculation for DBSS_Top_Sql

where DBSS_Top_Sql is the topology type and DBSS_Total_Elapsed_Time_Per_Exec is the metric name.

If there is evidence of derivation problems, contact Quest Support with this information for further assistance and to determine if the issue has been resolved in the latest version of the affected cartridge.

---- com.quest.nitro:service=Topology

This is the topology service.

The most useful part of this service is in the extra information, which lists all topology types and, for each type, the number of instances, the number of instance versions, the maximum versions, and the effective instance versions. This information helps you determine if the topology is too large, or if there is a topology churn. Look for a high number of versions of instances, as well as a high maximum versions.

The topology table has the following six columns:

1

Topology Type—name of the topology object type.

2

Num Instance Version—number of versions of all instances of this topology type combined.

3

Max Version—version number of the single most changing instance.

4

Num Effective—number of active instances of this object.

5

Num Recent Versions—number of new versions of all instances in the last seven days.

6

Num Recent Instances—number of new instances created in the last seven days.

Topology Related Issues

Topology churn is defined as the constant changing and creation of new versions of existing topology objects. Each time a property is updated on an instance, a new version of that instance is created. Topology churn can cause high CPU usage as the Management Server propagates the changes across the rest of the topology model.

Topology growth is defined as the continuous creation of new instances of a type of topology object. Topology growth can cause high CPU usage as models and rulettes are updated, as well as increased JVM heap usage. The entire topology model is stored in memory, so as the number of objects added increases, so does the heap usage.

Identifying and Resolving Topology Churn and Growth

If the values in columns five and six in the table above are greater than 5000, examine the highest numbers and work your way down the list. Resolving issues with the higher ones can sometimes resolve other churn issues, since topology changes to one object can cause changes in other objects.

For example, consider the sample rows of the topology table below:

| DBO_Alert_Log | 76 | 2 | 38 | 76 | 38 |

This is an example of a good model. There are 38 instances (column 4), with a maximum of 2 versions (column 3), for a total of 76 versions (column 2). The numbers are in balance.

| DBO_Datafile | 5816 | 2 | 2908 | 2 | 1 |

This is also an example of a good, stable model. Even though the numbers are higher, 2908 x 2 = 5816, so the numbers are in balance. Additionally, in the last 7 days, there was only 1 new object, with 2 changes. There is no large growth or churn in this example.

| DBO_Undo_Activity_Info | 393761 | 16810 | 39 | 0 | 0 |

This is an example of a model that was bad but has become good. There are 393761 total versions in history, but no new changes (0) in the last 7 days.

| HostNetwork | 238231 | 4472 | 846 | 234543 | 42 |

This is a bad topology model. A large number (234543) of new versions have been created in the past 7 days.

| VMWESXServerPhysicalDisk | 28652 | 3 | 3530 | 10590 | 5295 |

This is also a bad model. In the past 7 days, 5295 new instances have been created. Column 4 indicates that some stale object cleanup has been done, but unless the root cause is found, the instances will keep being created.

If there is evidence of topology problems, contact Quest Support with this information for further assistance and to determine if the issue has been resolved in the latest version of the affected cartridge.

---- com.quest.nitro:service=DataCacheEviction

This service lists metrics that are being held in the JVM waiting to be written to the database permanently. This information is located in the Cache Policies section of the diagnostic snapshot.

If many (thousands) of metrics are held in memory for long periods of time, they cannot be cleaned up by a garbage collector (GC) because they are active/live objects. Therefore, a large portion of memory is used simply by data that should be written into the database instead. This lead to JVM heap exhaustion, and performance problems.

The following is an example of the Cache Policies section of the diagnostic snapshot:

Cache Policies:

cbc82b6a-1f8c-4fa8-a88a-fcb07af2854e:file_physical_io_pct - age:259200000 granularity:300000 cached duration:259500000 num values:123 delay:192764

c25e1d2-c20b-400e-b87c-b9749c28899a:DBO_File_Avg_Read_Time_Ms - age:259200000 granularity:300000 cached duration:259500000 num values:122 delay:43089

1a6c1537-3050-499d-9e93-1f702b1ab77f:file_read_time - age:259200000 granularity:300000 cached duration:259500000 num values:140 delay:11026

1279f9c8-c72d-4deb-9080-73eeed73d70a:DBO_Datafile_File_Write_Requests_Rate - age:259200000 granularity:300000 cached duration:259500000 num values:118 delay:12308

71d83485-c441-45e9-9cc6-ccd3bb510f71:file_physical_writes - age:259200000 granularity:300000 cached duration:259500000 num values:136 delay:37018

Each line can be broken down as follows:

cbc82b6a-1f8c-4fa8-a88a-fcb07af2854e—topology object ID

file_physical_io_pct—name of the metric

age:259200000—length of time the metric is kept in memory (in ms)

granularity:300000—rawness of the metric value (in ms)

num values:123—number of values of this metric on this object

delay:19276—length of time the metric has been in memory

You can search the diagnostic snapshot for the metric name, and locate its parent topology in the XML schema. For example, for the metric detailed above:

<property name='file_physical_io_pct' type='Metric' is-many='false' is-containment='true' unit-name='count'>
<annotation name='UnitEntityName' value='percents'/>
</property>

This metric is contained in the following XML tag:

<type name='DBO_Datafile_IO_Activity'
extends='DBO_Instance_Alarm_Object'>

This indicates that the file_phyiscal_io_pct metric is part of the DBO topology.

Contact Quest Support with this information for further assistance and to determine if the issue has been resolved in the latest version of the affected cartridge.

Related Documents

The document was helpful.

Select Rating

I easily found the information I needed.

Select Rating

© 2024 Quest Software Inc. ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center