KACE SMA Performance Considerations and Best Practices (4210034)

Return

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Title

KACE SMA Performance Considerations and Best Practices
Description

KACE SMA Performance Considerations and Best Practices
This document refers to the ‘Admin UI’ and ‘System UI’ of the Systems Management Appliance throughout. To access the Admin or System UI, adding a trailing /adminui or /systemui to the URL of the KACE Systems Management Appliance (SMA) is the method to access these interfaces. Examples:
Admin UI: https://kacesma.domain.loc/adminui
System UI: https://kacesma.domain.loc/systemui
Note: Attempting to access the System UI on a single-org SMA will simply redirect to the Admin UI. For multi-org systems, settings exist throughout both Admin UI (per Org) and System UI (globally).
When troubleshooting performance issues on a KACE appliance, it is extremely important to understand the symptoms of the problem. When contacting KACE Support, the more precisely we can identify the issue, the faster we can resolve it.
Resolution
For more information on this topic, please see the KACE-SMA Course 2 Installing the KACE SMA Agent-Web-based Training.
Also, a KACE Support Webinar was delivered on Oct. 27, 2017 covering the details discussed in this article. The recording is available at https://support.quest.com/kb/234300
The topics listed in this article pertain to performance considerations and best practices for the KACE Systems Management Appliance. It is a living document, so keep it bookmarked and check back often.
System Requirements
First, ensure minimum requirements are met for virtual appliances, and ensure physical hardware is currently supported.
Virtual Appliance
https://www.quest.com/products/kace-systems-management-appliance
Specifications | Virtual Appliance Technical Specifications | Virtual machine system requirements
Physical Appliance
https://www.quest.com/products/kace-systems-management-appliance
Specifications | Physical Appliance Technical Specifications | Hardware Specifications

Agent intervals
Multi-org: System UI | Organizations | [ORG Name]
Single-org: Admin UI | Settings > Provisioning | Communication Settings
- Max recommended connections per hour is 500 (including agents from ALL Orgs)
- Ideally, set each interval as conservatively as possible. Example: If there is no need for inventory results to update more than once per day, set Agent Inventory to 1 Day. Setting this value to 4 hours, in this case, would put unneeded additional stress on the server – thus negatively impacting performance.
Apache Limitations
The SMA utilizes Apache for all agent and web UI activity, and Apache has a hard-coded limit of 800 simultaneous connections. Apache connections are typically very quick and temporary. Serving a page in the UI, for example, only uses up a connection while the data is being served. The longest connections tend to be file/payload downloads by agents. The following functionality is impacted by overloading Apache:
- UI Access – Helpdesk users and admins impact this threshold by constant use of the web UI.
- Agent Intervals – Having too frequent of settings for agent intervals can cause Apache to max its connection limit.
- Scripting – Online scripts and their payloads are pulled by agents at scheduled run-time. Having too many online scripts scheduled too frequently can negatively impact performance of the UI, stop or delay agent activity, and generally impact overall server performance.
- Patching Schedules – Patching schedules run similarly to online scripts, so the previously mentioned info about Scripting applies here.
Remedies and Prevention
- Replication Shares – Replication shares are the best option for reducing agent traffic to/from the SMA. By offloading the work of serving files to agents to the replication shares, all of that traffic that would have gone to the SMA for script downloads, payloads, patch files, etc. will all be pulled from the replication share for each agent configured to use one.
- Agent Intervals – Lengthening agent intervals can reduce the connections per hour to reduce impact on the Apache server for the SMA.
- Consider using "Enable Webserver Compression". Enabling compression allows the web server to compress web pages and decrease loading time in your browser. Compressing the web pages will add a slight overhead to the web server, but it can make a noticeable improvement in web page loading times. Ensure all the above is checked and verified for Apache performance prior to implementing this change.
  - Multi-org: System UI | Settings | Control Panel | Security Settings
    Single-org: Admin UI | Settings | Control Panel | Security Settings
Discovery Schedules
Discovery schedules do not directly impact agent communication or web UI performance, because they do not involve Apache. However, having aggressive discovery schedules can have a noticeable impact on overall server performance.
Script Schedules / Patch Schedules
Script and patch schedules can directly impact agent communication and web UI performance, because they involve Apache. Having aggressive script or patch schedules can have a noticeable impact on overall server performance as well as cause issues with agent communication and UI access. The best option for removing most of this impact is the utilization of replication shares. Aggressive scheduling can still cause issues with replication shares, but the threshold is much higher. When using replication shares, the most likely culprit of any throughput issues (e.g. schedules not completing) is going to be the task throughput setting.
Offline scripts have less of a visible presence on the server, by nature. Keep in mind that offline scripts can impact performance due to results upload (e.g. if an offline script runs very frequently, compounded results uploads in large environments can actually max out Apache threads).
Task Throughput / Load Average
Multi-org: System UI | Settings | General Settings
Single-org: Admin UI | Settings | Provisioning | Communication Settings
Task Throughput is a multiplier used by the Konductor (the ‘brain’ for agent tasks) service on the SMA. The default value of 3 can be stepped up 1 level every 15 minutes, but should only be increased if the load average on the server remains below 10 and the amount of ‘ready to run’ tasks cannot be kept up with. If the server is sending out tasks fast enough to maintain an empty queue, this value should not be increased. The lowest setting that works in the environment should be used.
The number of total tasks is affected by the amount of patch schedules and scripts based on the number of agents in production. The agent interval also affects how frequently several of these tasks are scheduled and sent to all agents – which again is going to impact the total number of tasks the server needs to send to agents and the amount of throughput required to complete them.
Task Throughput should not be increased above 5 without consulting KACE Support.
Troubleshooting Agent Tasks
Konductor Log
Multi-org: System UI | Settings | Logs
Single-org: Admin UI | Settings | Logs
[2017-10-23 17:21:57 -0500] Konductor[1384] [main] stats [s:21874 t:8 tc:8 c:8 cc:738 sl:30 tpl:40 at:399 rt:400 lt:3 lv:0.31]
- s = seconds konductor has been active
- t = tasks launched
- tc = tasks completed
- c = tasks completed (same as tc)
- cc = calls completed (konductor refreshes the task list)
- sl = sleep interval in seconds (delay between loops)
- tpl = tasks distributed by konductor per loop
- at = active threads (400 max, number decreases as threads are used)
- rt = reserved threads (400 max, number decreases as threads are used)
- lt = target load (this will match the current task throughput setting)
- lv = actual load for past 1 minute (should never exceed 10, the lower the better)
Agent Tasks Status
Multi-org: System UI | Settings | Support | Display Agent Task Status
Single-org: Admin UI | Settings | Support | Display Agent Task Status
- All Tasks – displays all tasks in the agent task queue
- In Progress – displays all tasks sent to agents that have not completed
- Ready to Run (connected) – displays tasks ready to run with active agents connected and waiting
- Ready to Run – displays tasks ready to run but agents not connected
- Longer than 10 minutes – displays tasks that have been running longer than 10 minutes
- Task Type – allows filtering based on task type (e.g. inventory, specific script, etc.)
- Organization – allows filtering by organization
Example 1:
Problem: Agents are not receiving tasks, inventories are taking a long time, etc. Tasks launched (t) and tasks completed (tc and/or c) are equal or close to equal, but active threads (at) and/or reserved threads (rt) are much lower than the standard idle level of 400 each.
Cause: Offline scripts are running too frequently, and active threads are being consumed by agents uploading offline script results.
Example 2:
Problem: Agent tasks are not being sent fast enough, and they are backing up in the agent task list. Many tasks are in “Ready to Run (connected)” status, and active/reserve threads (at/rt) appear to have many available threads.
Cause: Task throughput may need to be increased, as the server is keeping up with the load but cannot hand tasks out fast enough to keep up with demand.
Scheduled Reports
The impact of scheduled reports on server performance is going to depend upon the number and frequency of scheduled reports along with the size of the data set being queried.
Tips:
- The reporting wizard should produce efficient queries, but it is possible to create a very large dataset that can take several minutes to run.
- Report the smallest data set needed. Example: Need to see Windows machines missing a software title running a specific version of Windows? Filter the report to only show Windows systems running that specific version of Windows meeting the criteria.
- Schedule reports similar to agent task intervals. Only schedule them to run as often as required by the need. Example: If a daily report is sufficient, do not schedule it hourly.
Smart Labels
The impact of smart labels on server performance can be fairly substantial. Device and User smart labels (including LDAP labels) are processed when a device uploads inventory data or a user logs in, respectively. This means every Device smart label enabled on the server runs against every inventory upload. For example, 100 smart labels enabled in an environment with 15,000 devices would cause 150,000 queries for smart labels throughout each inventory cycle. Similarly, 100 User smart labels would be evaluated at each login of a user.
Tips:
- Smart Labels (including LDAP labels) should be as precise as possible.
- Each label should be tested to ensure they do not cause lag on the server. It’s entirely possible for a single inefficient label to back up queries on the server causing lag – or in some cases even halt the flow of inventory data into the database.
- LDAP labels should be as restrictive in LDAP as possible. The base DN should be as deep into the tree as possible, and the advanced search filter should be very precise to only return the required data.
Ticket Rules
Even though ticket rules tend to apply to the helpdesk, it is possible to access data in other areas of the database – which can impact performance in other areas.
Tips:
- Ticket rules should always be tested prior to full implementation.
- Ticket rules are run against every ticket in a queue when the ticket is saved or modified by email.
- Each rule will show the last run at the bottom of the page and how long it took to run. Keep in mind how long each takes to run, as the total time of all rules together for a queue will determine how long a ticket save takes to process.
Custom SQL (Ticket Rules, Reports, Smart Labels)
When possible, the wizards in the UI should be used. Custom SQL is an expert-level option we give the customer, and it is extremely powerful but also very dangerous.
- All custom SQL should be tested prior to implementation (MySQL Workbench, etc.).
- Select only fields required, as using a select * is going to cause a larger dataset to return.
- Due to the nature of custom SQL, it is possible to write a query that has a major impact on server performance.
- Support does not offer assistance with custom SQL, but KACE Professional Services is available as a fee-based offering for any custom SQL needs.
Contacting KACE Support
When contacting KACE Support, it is important to identify the symptoms of the performance issue as precisely as possible. Time can also be saved by preemptively pulling server logs from the appliance, as these are typically required to diagnose the issue. To pull server logs:
On a multi-org appliance:
System UI | Settings | Support | Retrieve appliance activity logs
On a single-org appliance:
Admin UI | Settings | Support | Retrieve appliance activity logs
This will download a server log package in .tgz format that KACE Support may request to assist in diagnosing the issue.
In addition to pulling logs, it is important to ensure a valid backup (both base and differential files – labeled as ‘base’ and ‘incr’) has been offloaded from the appliance to a local or network share in your environment.

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Request a KB Article

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

KACE SMA Performance Considerations and Best Practices (4210034)

Title

Description

Resolution

System Requirements

Agent intervals

Apache Limitations

Discovery Schedules

Script Schedules / Patch Schedules

Task Throughput / Load Average

Troubleshooting Agent Tasks

Scheduled Reports

Smart Labels

Ticket Rules

Custom SQL (Ticket Rules, Reports, Smart Labels)

Contacting KACE Support

Leave a Comment