Chat now with support
Chat with Support

Foglight 7.1.0 - Performance Tuning Field Guide

Overview Hardware and Operating System Tuning Management Server Tuning Java Virtual Machine Tuning Backend Database Tuning High Availability (HA) Tuning Agent Tuning Store and Forward Processing Appendix: Performance Health Check Appendix: Analyzing a Support Bundle

Shares

For the Foglight® Management Server to run in a virtual image, share allocations must provide the Management Server with sufficient priority over the other VMs to ensure that the Management Server can process incoming data and browser interface requests in a timely manner.

Disk

Disk size is fixed at the time of virtual image creation, so ensure beforehand that enough disk is allocated to the image (as per the platform-sizing guideline).

In a VMware® environment, disk (in the form of LUNs) is allocated to a Data Center, ESX® Servers, and VMs. The VMs from many ESX Servers can reside on the same LUN, which can also be shared by many ESX Servers. Therefore, the disk activity in a VM on an ESX Server can have a serious adverse impact on the performance of the VM in which the Foglight® Management Server is running, even when the VMs are running on separate ESX Servers—a situation that cannot arise when the Management Server is running on physical hardware.

 

Management Server Tuning

The parts of the Foglight® Management Server that have the greatest influence on runtime performance are topology (management and querying), observations (conversion, storage, and retrieval), and alarms and alarm processing (derivations and rules). To provide these components and the associated calculations, an architecture that includes queues, thread-pools, caches, and so on, is required.

The Management Server architecture can be tuned in the following ways.

Table 1. Alarm Limit

A lot of the information presented in the browser interface is derived from alarms (alarm counts, topology object states, and so on). If rules are configured poorly, this can result in the creation of a very large number of alarms within the system. The server imposes a default limit of 10000 alarms that can be displayed and used to calculate object states. When the server enforces this limit, it gives preference to the current alarms so that objects are presented in the correct state.

The browser interface performance is poor (that is, it responds slowly), especially when displaying the alarms table.

The server is not overloaded, and is still processing data.

CPU usage on the server is high when it is providing alarm information.

Threads may become blocked for several seconds, intermittently, when attempting to load alarm details.

The alarm limit may need to be reduced.

MBean: *:service=Alarm

Attribute: MaxAlarms

Expected old value: 10000

New value: 5000

You can set this parameter using the foglight.alarm.query.max_alarms JavaTM system property. For example in server.config, add:

server.vm.option0="-Dfoglight.alarm.query.max_alarms=5000";

If the alarm limit is reduced to the point where the server is not able to load all of the current alarms into the system, then some topology objects may be displayed with an incorrect state.

The amount of processing that a server has to perform is often proportional to the number of topology objects (for example, rulettes) in the system. Cartridges that monitor very large systems may end up creating enough topology objects to bring the server into an overload situation. To protect against this, the topology service is configured to limit the number of instances of each object type that can be created.

The browser interface performance is poor (that is, it responds slowly).

The server memory usage is high.

The server may be overloaded.

Examine the topology instance counts shown in a support bundle.

If there is an object type with an excessive number of effective topology objects, then some of those objects may need to be deleted. Try to determine why the cartridge created so many objects of that type. Ideally, you should modify the cartridge configuration to prevent it from recreating those objects.

As a precaution, you can reduce the topology limit for the object type. The limit for a type can be set using the foglight.limit.instances registry variable.

When the limit for an object type is reached, messages appear in the server log and an alarm is raised in the browser interface. If it is reasonable for the object type in question to have a large number of instances, then the limit for that type should be increased to prevent the error messages from being generated. If it is not reasonable for a type to have a large number of instances, you should tune your agent so that it does not create so many of them.

The Management Server utilizes a scaling algorithm to calculate the number of threads in its thread pools. This algorithm takes into account number of CPUs available on the server. However, in certain cases where the Management Server shares the server with other applications, it is desirable to limit the number of CPUs the Management Server takes into account. Please note that this setting does not limit the actual CPUs the Management Server will use.

You can set this parameter using the foglight.threadpool.cpu.count JavaTM system property. For example in server.config, add:

server.vm.option0="-Dfoglight.threadpool.cpu.count=4";

Dashboard Default Timeout

If your dashboards are timing out, try increasing the default timeout value:

1
Open the <foglight_home>/server/default/deploy-foglight/console.war/scripts/ directory.
3
Search for DEFAULT_TIMEOUT.
4
Increase the value of DEFAULT_TIMEOUT to 180000. This increases the console default timeout to 3 minutes. (The number is based on milliseconds; 1000 is one second.)

 

Related Documents

The document was helpful.

Select Rating

I easily found the information I needed.

Select Rating