Agent Managers are establishing too many connections to Foglight Management server and the FglAM logs show the following messages:
2017-04-03 18:27:29.539 ERROR [Quartz[0]-800] com.quest.glue.core.comms.transport.http.Client - Could not send upstream request to https://fms-server:8443/catalyst-glue-service/message/secure. Received response code 429. Explanation: Too Many Requests. Current connection request limit has been reached [16]. Try again soon.. Extra info: Last successful connection was at 2017-04-03T18:09:20-0700. 41 connection failures so far.
The error message may vary due to different FMS address, or different limit, etc.
In rare occasions, this can cause lost collections.
Some environments may include too many Agent Managers that try to connect to the FMS during a short period of time. The concurrent connection limit is set to 16 by default.
If there are many FglAM; then, for instance, when the FMS is starting, several FglAMs might be trying to connect at the moment the FMS is beginning to accept FglAM connections. The FMS will then accept all connections until it reaches the FglAM adapter limit. If the established connections haven't been freed by the time the next connection is attempted, the FMS will reject the connection with an error code of 429.
This does not necessarily mean that the FglAMs will fail to connect to the FMS indefinitely, it just means that it will have to retry on the next cycle.
If the number of FglAMs is too large and need to upload a large volume of data to the FMS, this may cause the FglAM adapter to not be able to free up connections dast enough to allow new connections to be made. If FglAMs fail to establish a collection for longer than an hour, this will cause some collections to be discarded.
If this issue is causing missing collections, increase the FglAM adapter "Max connection permits" by following KB152661.
Please keep in mind that this KB instructs to increase the permits to 24, but this may not be enough for larger environments with several hundreds of FglAMs.
If issues persist and collections are being skipped or lost, and increasing the permits is causing the FMS to overload, it might be necessary to split monitoring using a Federation scheme. This would require an engagement from PSO.
© ALL RIGHTS RESERVED. Feedback Terms of Use Privacy Cookie Preference Center