This article explains the impact of topology churn on Foglight Management Server (FMS) performance and stability. It outlines common symptoms, causes, and recommended actions for reducing or managing topology churn.
Slow or unresponsive Foglight dashboards
Delayed or missed alarms and email notifications
High FMS memory usage or frequent Full GC events
Agent registration failures or missed data collections
Support bundle generation failures or timeouts
Duplicate or missing topology objects in the UI
Topology churn occurs when topology objects (e.g., database instances, hosts, VMs, network connections) are frequently created, updated, or deleted. This can be caused by:
Flapping agents (e.g., due to connectivity or misconfiguration)
Overactive or repeated discovery jobs
Automation scripts frequently altering monitored assets
High-frequency custom data collections that create new objects dynamically
Concurrent agent provisioning or restarts
Churn increases memory, CPU, database I/O, and internal event processing overhead on the FMS, often resulting in degraded performance.
Area | Impact |
---|---|
Heap & Garbage Collection | Increased old-gen usage, frequent Full GC, risk of OutOfMemoryError |
Rule Engine | Delays in rule evaluation, missed or duplicated alarms |
UI and Dashboards | Slow page load, partial rendering, object resolution errors |
Agent Communication | Registration failures, delayed config sync, connection timeouts |
Database & Storage | Higher I/O from persistent topology writes, contention on internal tables |
Topology Integrity | Duplicate or orphaned objects, broken relationships |
Thread and Task Queues | Backlogged tasks, scheduler misfires, internal timeouts |
typeName
= "Host"; (the topology object to report on)numDays
= 1; (the time period the script queries)churnThreshold
= 10; (the threshold of the number of changes to an object to report on. Setting this to 0 will report everything)numObjects
= 10; (the list of entries reported in the output)The attached script "find-frequent-changes_4hours_hostprocess.groovy" is set to check for frequent topology changes during the last 4 hours; the "typeName" variable needs to be changed to the desired topology type.
After changes have been implemented to clean up any topology types causing high topology churn, the attached script "find-frequent-changes_15min_hostprocess.groovy" can run to find frequent changes for those affected types, but this time for the last 15 minutes to see the behavior for shorter time. The script was updated with startTime = now - (15 * 60 * 1000L);
After identifying topology churn, different solutions may apply depending on the impacted agents and topology objects; please review the following articles:
For duplicate Infrastructure agents:
Also check for Host aliases:
For FileLogMonitorFileGroup objects:
For duplicate IP addresses:
For database agents and objects:
For VMware agents and objects:
How to identify Virtual Machines with duplicate vmID (KB 4292241)
For MultiHostProcessMonitorAgent:
Pause or remove flapping or unused agents.
Alternatively, add the following JVM parameter to the baseline.jvmargs.config
file and restart the FglAM:
vmparameter.0 = "-Dagent.collector.schedule.load.max.delay.millis=300000";
This spreads agent activation over 5 minutes instead of the default 2 minutes, reducing collection overlap.
Stagger agent start times to reduce startup spikes by groups (20–30 agents every ~17 seconds)
Track heap usage and garbage collection
Monitor CPU, thread usage, and disk I/O.
Review Thread Dumps for long-held locks or slow tasks.
If symptoms persist, perform a controlled restart of the FMS (and FglAMs if needed) during off-peak hours.
If FMS memory usage is consistently >85%, increase heap size in server.config
or service startup script.
© 2025 Quest Software Inc. ALL RIGHTS RESERVED. Terms of Use Privacy Cookie Preference Center