KACE SMA Performance Considerations and Best Practices
This document refers to the ‘Admin UI’ and ‘System UI’ of the Systems Management Appliance throughout. To access the Admin or System UI, adding a trailing /adminui or /systemui to the URL of the KACE Systems Management Appliance (SMA) is the method to access these interfaces. Examples:
Admin UI: https://kacesma.domain.loc/adminui
System UI: https://kacesma.domain.loc/systemui
Note: Attempting to access the System UI on a single-org SMA will simply redirect to the Admin UI. For multi-org systems, settings exist throughout both Admin UI (per Org) and System UI (globally).
When troubleshooting performance issues on a KACE appliance, it is extremely important to understand the symptoms of the problem. When contacting KACE Support, the more precisely we can identify the issue, the faster we can resolve it.
For more information on this topic, please see the KACE-SMA Course 2 Installing the KACE SMA Agent-Web-based Training.
Also, a KACE Support Webinar was delivered on Oct. 27, 2017 covering the details discussed in this article. The recording is available at https://support.quest.com/kb/234300
The topics listed in this article pertain to performance considerations and best practices for the KACE Systems Management Appliance. It is a living document, so keep it bookmarked and check back often.
First, ensure minimum requirements are met for virtual appliances, and ensure physical hardware is currently supported.
Specifications | Virtual Appliance Technical Specifications | Virtual machine system requirements
Specifications | Physical Appliance Technical Specifications | Hardware Specifications
Multi-org: System UI | Organizations | [ORG Name]
Single-org: Admin UI | Settings > Provisioning | Communication Settings
The SMA utilizes Apache for all agent and web UI activity, and Apache has a hard-coded limit of 800 simultaneous connections. Apache connections are typically very quick and temporary. Serving a page in the UI, for example, only uses up a connection while the data is being served. The longest connections tend to be file/payload downloads by agents. The following functionality is impacted by overloading Apache:
Remedies and Prevention
Discovery schedules do not directly impact agent communication or web UI performance, because they do not involve Apache. However, having aggressive discovery schedules can have a noticeable impact on overall server performance.
Script and patch schedules can directly impact agent communication and web UI performance, because they involve Apache. Having aggressive script or patch schedules can have a noticeable impact on overall server performance as well as cause issues with agent communication and UI access. The best option for removing most of this impact is the utilization of replication shares. Aggressive scheduling can still cause issues with replication shares, but the threshold is much higher. When using replication shares, the most likely culprit of any throughput issues (e.g. schedules not completing) is going to be the task throughput setting.
Offline scripts have less of a visible presence on the server, by nature. Keep in mind that offline scripts can impact performance due to results upload (e.g. if an offline script runs very frequently, compounded results uploads in large environments can actually max out Apache threads).
Multi-org: System UI | Settings | General Settings
Single-org: Admin UI | Settings | Provisioning | Communication Settings
Task Throughput is a multiplier used by the Konductor (the ‘brain’ for agent tasks) service on the SMA. The default value of 3 can be stepped up 1 level every 15 minutes, but should only be increased if the load average on the server remains below 10 and the amount of ‘ready to run’ tasks cannot be kept up with. If the server is sending out tasks fast enough to maintain an empty queue, this value should not be increased. The lowest setting that works in the environment should be used.
The number of total tasks is affected by the amount of patch schedules and scripts based on the number of agents in production. The agent interval also affects how frequently several of these tasks are scheduled and sent to all agents – which again is going to impact the total number of tasks the server needs to send to agents and the amount of throughput required to complete them.
Task Throughput should not be increased above 5 without consulting KACE Support.
Multi-org: System UI | Settings | Logs
Single-org: Admin UI | Settings | Logs
[2017-10-23 17:21:57 -0500] Konductor [main] stats [s:21874 t:8 tc:8 c:8 cc:738 sl:30 tpl:40 at:399 rt:400 lt:3 lv:0.31]
Agent Tasks Status
Multi-org: System UI | Settings | Support | Display Agent Task Status
Single-org: Admin UI | Settings | Support | Display Agent Task Status
Problem: Agents are not receiving tasks, inventories are taking a long time, etc. Tasks launched (t) and tasks completed (tc and/or c) are equal or close to equal, but active threads (at) and/or reserved threads (rt) are much lower than the standard idle level of 400 each.
Cause: Offline scripts are running too frequently, and active threads are being consumed by agents uploading offline script results.
Problem: Agent tasks are not being sent fast enough, and they are backing up in the agent task list. Many tasks are in “Ready to Run (connected)” status, and active/reserve threads (at/rt) appear to have many available threads.Cause: Task throughput may need to be increased, as the server is keeping up with the load but cannot hand tasks out fast enough to keep up with demand.
The impact of scheduled reports on server performance is going to depend upon the number and frequency of scheduled reports along with the size of the data set being queried.
The impact of smart labels on server performance can be fairly substantial. Device and User smart labels (including LDAP labels) are processed when a device uploads inventory data or a user logs in, respectively. This means every Device smart label enabled on the server runs against every inventory upload. For example, 100 smart labels enabled in an environment with 15,000 devices would cause 150,000 queries for smart labels throughout each inventory cycle. Similarly, 100 User smart labels would be evaluated at each login of a user.
Even though ticket rules tend to apply to the helpdesk, it is possible to access data in other areas of the database – which can impact performance in other areas.
When possible, the wizards in the UI should be used. Custom SQL is an expert-level option we give the customer, and it is extremely powerful but also very dangerous.
When contacting KACE Support, it is important to identify the symptoms of the performance issue as precisely as possible. Time can also be saved by preemptively pulling server logs from the appliance, as these are typically required to diagnose the issue. To pull server logs:
On a multi-org appliance:
System UI | Settings | Support | Retrieve appliance activity logs
On a single-org appliance:
Admin UI | Settings | Support | Retrieve appliance activity logs
This will download a server log package in .tgz format that KACE Support may request to assist in diagnosing the issue.
In addition to pulling logs, it is important to ensure a valid backup (both base and differential files – labeled as ‘base’ and ‘incr’) has been offloaded from the appliance to a local or network share in your environment.