Chat now with support
Chat with Support

Foglight 5.9.5 - Performance Tuning Field Guide

Overview Hardware and Operating System Tuning Management Server Tuning Java Virtual Machine Tuning Backend Database Tuning High Availability (HA) Tuning Agent Tuning Appendix: Performance Health Check Appendix: Analyzing a Support Bundle

Topology Changes and Topology Churn

Monitored data in the Foglight® Management Server can be sub-divided into two areas with distinct properties:

It is generally assumed that topology objects change very little over time. Observations are expected to be highly volatile over time.

The decision of whether to add a particular piece of data to the topology model or to treat it as an observation is made during cartridge development. This decision is expressed in the software using CDT configuration.

The server is generally optimized to handle stable topology models where topology changes are infrequent. If topology changes occur on a more regular basis, this is known as Topology Churn, and it usually results in diminished server performance.

The server is optimized to handle stable topology models. Therefore, it is expected that the Management Server is configured to minimize topology changes.

The browser interface performance is poor (that is, it responds slowly).

Topology queries (in the browser interface as well as in groovy scripts) are slow.

Data is being dropped by the Data Service.

Check the System Changes chart on the Alarms System dashboard.

Check the batchesDiscarded metric produced by the Data Service that is located in Dashboards > Foglight > Servers > Management Server View > Data Service.

Explore the topology model using the Data dashboard located in Dashboards > Configuration > Data and try to locate any noise (that is, potentially redundant) or unexpected objects.

In some cases, the agent defaults may be too broad. The agent may be monitoring everything that is visible to it. Configure agents to monitor only what is required.

Agent and/or CDT changes may be required. In such cases, support bundles are very helpful in the investigation.

Allocate more system resources to the Management Server.

The amount of topology changes that the server can process depends greatly on the overall system configuration (Host OS, database, and hardware).

Canonical Data Transformations (CDTs)

Theoretically, CDTs can be a performance bottleneck. Unfortunately, it is not easy to tune them. In most cases, a cartridge update is required.

CDTs convert data received from agents into the server’s internal representation (that is, into the Canonical Data Form). Although this process is usually fast, it can be computation-intensive and therefore may cause performance issues.

The server is generally slow overall.

The CDT transformTime metrics are high. Typical values for OS Cartridge agents are in the 0.01 - 0.1 second range, in 15-minute intervals.

CDT transformTime metrics can be accessed through the browser interface by going to Dashboards > Configuration > Data > Foglight > All Data > AllTypeInstances > TopologyObject > subTypes > CanonicalDataTransform > instances > ... > transformTime.

CDT tasks are visible in thread dumps.

Tuning will probably have to be done by the agent development team. A support bundle along with the thread dumps will be very helpful in the investigation.

Agent Weight / Environment Complexity

Foglight® Management Server maintains an internal metric that represents, roughly, the amount of work the server has to do in order to process the data collected by the agents.

This metric is called aggregateAgentWeight. It is available from the EnvComplexityEstimator service in the Management Server Metrics dashboard: Dashboards > Foglight > Servers > Management Server Metrics > <CatalystServer> > Services > > aggregateAgentWeight.

This metric is derived from the number and types of connected agents according to: <foglight_home>/config/agent-weight.config.

The value of the metric is typically expressed in agent units. Recent server builds generally work well with up to 4000 agent units connected.

The agent-weight.config set-up is based on Quality Assurance (QA) capacity test results. Generally, it should not be changed. However, if new data on the relative agent weight is available, the configuration file can be adjusted manually. You must restart the server after you change this configuration file.

Large Topologies

On occasion, agents may send so much observation data that the resulting topology model becomes too large and causes performance problems in the server.

Java EE Technologies agents are the most likely to cause this type of situation, if they are not carefully pre-configured.

The server is in overload condition.

Agent data is being dropped.

The browser interface performance is poor (that is, it responds slowly).

Create a support bundle and check the topology size breakdown. A topology size breakdown by type is available in the diagnostic snapshot files that are part of each Management Server support bundle. Look for large topology object instance counts.

1. Stop data collection on agents that produce an excessive number of topology objects.

2. Re-configure those agents to produce reasonable amounts of topology data.

3. Delete any excess topology objects.

4. Resume data collection on the affected agents.

There is a large number of JavaEEAppServerRequestType topology objects.

You can reduce the number of JavaEEAppServerRequestType topology objects by adjusting the FilteringRules parameter in recording.config.

Related Documents