A Foglight Management Server or Agent Manager is located on a virtual machine which (VM) runs out of Java Heap Space even though the VM is configured to use 16GB of RAM.
Why is it important to configure CPU (processor) and memory (RAM) reservations on virtual machines used by Foglight components?
The Foglight Management Server (FMS), Foglight Agent Manager (FglAM) and SQL PI repository all process a considerable amount of data received from monitored hosts. This constant stream of data from numerous remote server provide little time for Foglight and Infobright (the software underlying the SQL PI component) force most of the collected data to be processed in real-time.
Most Administrators prefer to share Virtual Machine (VM) resources instead of dedicating CPU or memory to a single Virtual Machine. However, because of how Foglight functions, CPU and memory reservations are critical to keeping Foglight and SQL PI working properly. As VM resources are allocated on demand, there are risks of resource contention. A reservation is a guarantee on either memory or CPU for a virtual machine. This setting has a major effect on Foglight performance by guaranteeing that the environment doesn’t encounter resource contention.
Java Virtual Machine performance and Memory Reservations
From Page 10 of VMWare's Enterprise Java Applications on VMware Best Practices Guide:
"BP5 – Set memory reservation for VM (Virtual Machine) memory needs
Those memory spaces can be shared if the memory is not actually used (i.e. the memory is allocated but is not used). In situations however when all of the VMs have high memory usage, resource contention will occur.
Postgresql database performance and CPU reservations
Based on our internal testing and field experience from hundreds of customer environments, any SQL PI systems lacking CPU reservations are likely to encounter stability issues.
As the number of agents on a PI repository increases there is a proportional increase in the number of CPUs necessary for Infobright & Postgresql processing. These Virtual machines (with a high number of vCPUs) can run into scheduling problems when mixed with other “narrow” virtual machines (with a small number of vCPUs) on the same host. The problem occurs because the physical CPUs are being shared in time slices with the other VMs that are also running on the same host. SQL PI workload in a wide VM may stall periodically waiting for processors to become available.
Specifically for VMWare, ESXi assigns CPU resources to each machine when CPU cores available for all machines' cores. If a machine has 8 cores, so it has to wait for 8 cores of physical cpu. If other VMs on the same ESXi have 1, 2, or 4 cores, they would get priority of slices of processing time because the smaller number of CPUs is available sooner.
Viewing VM performance metrics at the hyper-visor level, might not provide a clear indication about how much resources Foglight requires. The need for resources is based largely on the product configuration and architecture, as well as the types and volume of traffic received. Insufficient VM resources can cause too much swapping, escalating memory usage, leading to instability, performance problems, out of memory errors, and even file system corruption.
Please refer to KB article 238293 for details on configuring VM reservations in Hyper-V
Please refer to KB article 233207 for details on configuring VM reservations in VMware.
Please refer to KB article 259596 for details on finding VMWare reservations in the FglAM bundles beginning with 5.9.3 and higher.
This external article provides an excellent analogy about the CPU Ready in a real world scenario.