An anomaly is any data point or suspicious event that stands out from the baseline/expected pattern. When data unexpectedly deviates from the established dataset, it can show an early sign of system malfunction, breaches, or Backup configuration changes.
e.g. unexpected data deletions, modifications, or excess data insertions.
Anomalies don’t always signify an issue but they are all worth investigating to better understand why a deviation occurred and if that anomaly is a valid point as compared to baseline or training set.
Importance
- Identify ransomware attacks sooner
- Limit downtime
- Limit/detect early data loss
- Better understanding of changes in the environment
Challenges
Anomaly detection is only valuable if it can find true anomalies that means training the system before it can be useful. Otherwise, the system can relay an excessive number of alerts /anomalies beyond what one could feasibly investigate.
Retraining anomaly detection system helps in re-establishing a new baseline.
Anomaly detection approach
To detect anomalies, required data is collected periodically. This data is used for training the model and detect anomalies.
As of now QoreStor collects required data with the interval of 5-minutes.
Once the QoreStor is installed or upgraded to 7.4.0, the data collection starts. Data collected in the first 3 months (90 days) is used for getting a baseline. Once a baseline is established, anomalies are predicted using those baseline. After that every 30 days, the baseline is re-established with the last 90 days of data.
Anomaly detection categories
Anomalies detection is categorized in the following categories: System-level, Container-level, Storage-group level.
System-level
Anomalies : Following anomalies are detected:
- System-level login authentication failures /anomalies
- QoreStor UI authentication failures /anomalies
- Protocol (OST/RDS) authentication failures /anomalies
- OS audit process stopped – this anomaly is reported as soon as the OSAudit process is either stopped or stopped/paused logging to audit files due to low space or any other issue
Authentication-related anomalies do not need training as thresholds are used. For instance, if “root” user login authentication fails 3 times or more from a host “abc”, then it is reported as an anomaly.
CLI: ‘system’ CLI can be used to configure the above anomaly detections. Please refer the QoreStor CLI Reference guide for more information.
Report: To determine the System level anomalies, the following entities are shown in the report.
- Client-name – Client from which failed authentication happened
- User name – Username used in authentication
- Failed count – Number of failed attempts
- Failed start/end time - The period of failed attempts occurred.
Container-level
Qorestor detects anomalies related to data ingest, data overwrites and data expiry at container level. For this, corresponding data is collected at regular intervals like the following metrics on that container.
Ingest and Overwrite: Detects Backup pattern and data size.
- Number of bytes ingested onto this container across clients within regular intervals
- The number of bytes overwritten across clients within regular intervals
Expiry
Files-deleted – Number of files/images deleted across clients. Internally tracks the total sizes of all deleted files. Data collected over 30-minutes of interval is used for anomaly detection.
NOTE: : Even if the containers are removed, anomalies can be queried through CLI or UI.
CLI : Container CLI can be used to tune/set anomaly detection metrics. Please refer to the QoreStor CLI Reference guide for more information.
Anomaly settings applied at the Storage Group level are automatically applied to all containers in it unless explicitly disabled/turned off at the individual container level.
Report : The container level anomaly report shows the following anomaly types :
- Ingest – Shows bytes-ingested and corresponding savings (which is not inline with the training period/dataset)
- Overwrite – Total bytes overwritten from backup (not expected as per training set)
- Expiry – number of files deleted and the sum of all file sizes (not expected as per training set)
- Start/end time – The time of anomaly occurred in the container
Storage-group level
At the storage group level, savings anomalies are detected. Savings are further classified into the following sub-categories:
- Savings – dedupe - If total post-dedupe bytes are outside of the training range
- Savings – compression - If total post-compression bytes are outside of the training range
CLI: storage_group CLI can be used to set anomaly detection metrics at the storage group level. Please refer to the storage_group command in the QoreStorCommand Line Reference Guide Reference guide.
Report: The storage group anomaly report shows the following metrics:
- Anomaly type – Savings
- Anomaly sub-type – deduplication or compression
To decide if this is an anomaly, following parameters are used:
- Current value of deduplication or compression bytes
- Minimum value and maximum value expected i.e., expected range
- Difference with range i.e., with nearest minimum or maximum value
Retraining
Automatic retraining: Once first-time training is completed (after 3 months of data), periodically every one month, retraining happens with the last 3 months of data. This is to update recent data for the training period. This helps in tuning the baseline. This happens for both containers and storage groups.
Manual retraining: ocamltrain CLI can be used to do on-demand retraining. Please refer to the QoreStor Command Line Reference Guide guide for more information.
Alerts
Following alerts/events are raised related to anomaly detection :
- When stats collection is not happening
- When anomaly detection service is not running
- When OS-audit stops running, the os-authentication anomaly detection is enabled at the system level
Reports
Anomaly reports are shown in QoreStor UI or can be queried using ocamlreport CLI. Emails can also be configured using the following CLI to send anomalies as and when detected.
/opt/qorestor/bin/email_anomalies --configure
Please refer the QoreStorCommand Line Reference Guideguide for more details.