Derivation Related Issues
This service is most often the cause of diagnostic snapshots that are 500 to 600 MB in size. Large file sizes indicate you may have an issue with an excessive number of derivations.
Search the file for: with .* derivation rulettes. The results should resemble the following:
DATA_DRIVEN with 115 derivation rulettes
DATA_DRIVEN with 115 derivation rulettes
DATA_DRIVEN with 187230 derivation rulettes
DATA_DRIVEN with 97 derivation rulettes
DATA_DRIVEN with 115 derivation rulettes
The third line (187230 derivation rulettes) indicates an issue because the number of derivation rulettes is significantly larger than any other rulettes listed. Locate the complex derivation definition (located above the with .* derivation rulettes section in the file), and make a note of the topology type and metric name. For example:
Complex derivation definition: DBSS_Total_Elapsed_Time_Per_Exec (null) : DerivationCalculation for DBSS_Top_Sql
where DBSS_Top_Sql is the topology type and DBSS_Total_Elapsed_Time_Per_Exec is the metric name.
If there is evidence of derivation problems, contact Quest Support with this information for further assistance and to determine if the issue has been resolved in the latest version of the affected cartridge.
---- com.quest.nitro:service=Topology
This is the topology service.
The most useful part of this service is in the extra information, which lists all topology types and, for each type, the number of instances, the number of instance versions, the maximum versions, and the effective instance versions. This information helps you determine if the topology is too large, or if there is a topology churn. Look for a high number of versions of instances, as well as a high maximum versions.
The topology table has the following six columns:
Topology Related Issues
Topology churn is defined as the constant changing and creation of new versions of existing topology objects. Each time a property is updated on an instance, a new version of that instance is created. Topology churn can cause high CPU usage as the Management Server propagates the changes across the rest of the topology model.
Topology growth is defined as the continuous creation of new instances of a type of topology object. Topology growth can cause high CPU usage as models and rulettes are updated, as well as increased JVM heap usage. The entire topology model is stored in memory, so as the number of objects added increases, so does the heap usage.
If the values in columns five and six in the table above are greater than 5000, examine the highest numbers and work your way down the list. Resolving issues with the higher ones can sometimes resolve other churn issues, since topology changes to one object can cause changes in other objects.
For example, consider the sample rows of the topology table below:
| DBO_Alert_Log | 76 | 2 | 38 | 76 | 38 |
This is an example of a good model. There are 38 instances (column 4), with a maximum of 2 versions (column 3), for a total of 76 versions (column 2). The numbers are in balance.
| DBO_Datafile | 5816 | 2 | 2908 | 2 | 1 |
This is also an example of a good, stable model. Even though the numbers are higher, 2908 x 2 = 5816, so the numbers are in balance. Additionally, in the last 7 days, there was only 1 new object, with 2 changes. There is no large growth or churn in this example.
| DBO_Undo_Activity_Info | 393761 | 16810 | 39 | 0 | 0 |
This is an example of a model that was bad but has become good. There are 393761 total versions in history, but no new changes (0) in the last 7 days.
| HostNetwork | 238231 | 4472 | 846 | 234543 | 42 |
This is a bad topology model. A large number (234543) of new versions have been created in the past 7 days.
| VMWESXServerPhysicalDisk | 28652 | 3 | 3530 | 10590 | 5295 |
This is also a bad model. In the past 7 days, 5295 new instances have been created. Column 4 indicates that some stale object cleanup has been done, but unless the root cause is found, the instances will keep being created.
If there is evidence of topology problems, contact Quest Support with this information for further assistance and to determine if the issue has been resolved in the latest version of the affected cartridge.
---- com.quest.nitro:service=DataCacheEviction
This service lists metrics that are being held in the JVM waiting to be written to the database permanently. This information is located in the Cache Policies section of the diagnostic snapshot.
If many (thousands) of metrics are held in memory for long periods of time, they cannot be cleaned up by a garbage collector (GC) because they are active/live objects. Therefore, a large portion of memory is used simply by data that should be written into the database instead. This lead to JVM heap exhaustion, and performance problems.
The following is an example of the Cache Policies section of the diagnostic snapshot:
Cache Policies:
cbc82b6a-1f8c-4fa8-a88a-fcb07af2854e:file_physical_io_pct - age:259200000 granularity:300000 cached duration:259500000 num values:123 delay:192764
c25e1d2-c20b-400e-b87c-b9749c28899a:DBO_File_Avg_Read_Time_Ms - age:259200000 granularity:300000 cached duration:259500000 num values:122 delay:43089
1a6c1537-3050-499d-9e93-1f702b1ab77f:file_read_time - age:259200000 granularity:300000 cached duration:259500000 num values:140 delay:11026
1279f9c8-c72d-4deb-9080-73eeed73d70a:DBO_Datafile_File_Write_Requests_Rate - age:259200000 granularity:300000 cached duration:259500000 num values:118 delay:12308
71d83485-c441-45e9-9cc6-ccd3bb510f71:file_physical_writes - age:259200000 granularity:300000 cached duration:259500000 num values:136 delay:37018
Each line can be broken down as follows:
cbc82b6a-1f8c-4fa8-a88a-fcb07af2854e—topology object ID
file_physical_io_pct—name of the metric
age:259200000—length of time the metric is kept in memory (in ms)
granularity:300000—rawness of the metric value (in ms)
num values:123—number of values of this metric on this object
delay:19276—length of time the metric has been in memory
You can search the diagnostic snapshot for the metric name, and locate its parent topology in the XML schema. For example, for the metric detailed above:
<property name='file_physical_io_pct' type='Metric' is-many='false' is-containment='true' unit-name='count'>
<annotation name='UnitEntityName' value='percents'/>
</property>
This metric is contained in the following XML tag:
<type name='DBO_Datafile_IO_Activity'
extends='DBO_Instance_Alarm_Object'>
This indicates that the file_phyiscal_io_pct metric is part of the DBO topology.
Contact Quest Support with this information for further assistance and to determine if the issue has been resolved in the latest version of the affected cartridge.