Terminology Primer
This section establishes the terms that are common for describing the operation of repositories. These terms will be used further without additional explanation.
Repository file
The folder structure in the repository contains files with events in a special format. The file format stores event field names and field contents for lookup. There is one file per computer per data source per gathering session.
Indexing queue
The indexing queue is a reserved folder with files that reference each repository file. There is one link file per repository file. The reference is contained in the link file name. If the length of the reference string is within the file system limits for file name length, the link file is empty; otherwise, one part of the reference string is in the file name, and the other is the file’s contents. A link file name starts with a time-stamp for easy access to the most recent events.
Link files are created during gathering for each repository file that arrives. Gathering is completely independent from indexing, and the gathering components do not track how the indexing queue is emptied—they only fill the queue.
Every time indexing is turned off, the contents of the indexing queue are deleted. Until indexing is enabled again, the queue is cleared periodically (recall that gathering always fills the queue).
Repository-managing InTrust server
This is the InTrust server that performs all repository operations. This server is specified in the repository properties. (In InTrust Default Package, the repository properties are accessed through the properties of an associated collection.)
Indexing InTrust server
This is the InTrust server that is selected for processing the index of a particular repository. It is not necessarily the server that hosts the repository. Indexing can even be a dedicated activity for a server; for details, see the Working with Repositories document.
After the repository-managing server has found data that has not been indexed, it asks the indexing server (or the indexing component if both are the same computer) to process the data, gets back the result and writes it to the index.
In InTrust Extended Package, this server is specified in InTrust Manager, in the repository properties, on the Indexing tab. In InTrust Default Package, the indexing server is always the same as the repository-managing server.
Main Repository Indexing
The main repository is a part of the repository that is the primary storage of audit data and its index. Prior to InTrust 10.7, it was synonymous with “repository”. In 10.7 and higher, the additional hot repository was introduced.
Indexing is a continuous activity; it is always on after it has been enabled. During indexing, the indexing InTrust server finds the link to the repository file with the most recent events in the indexing queue and checks if there is already an index entry for the file. If the file is already indexed, the server deletes the link and proceeds to the next link. If there is no matching index entry, the server creates it and deletes the link.
Notes:
- In the main repository, the indexing queue is stored in the \IndexingRoot$\storages.3 folder.
- When indexing is first enabled for a repository and the queue becomes empty for the first time, a file named storage.complete is created there. The presence of this file indicates that the repository has a managed index. If indexing is disabled, this file is removed along with the queue.
Hot Repository Indexing
The hot repository is the part of the repository, introduced in InTrust 10.7, is used for temporarily storing and aligning data that comes in during real-time collection. Using the main repository for these operations is inefficient, because the main repository structure would require working with large numbers of small files. The hot repository is designed to address this challenge by batching the incoming files and making a specialized index.
In the hot repository, storage is organized into segments. They contain event data for a period of time (one day by default) and are made up of repository file batches. The 24-hour period that the data goes back is a test-based balanced value that should not be changed unless advised by InTrust Support team. An index is created per segment, and segments are periodically optimized and merged into the main repository. When a segment is merged, its index is deleted.
Notes:
• Deletion from the hot repository can be delayed if there are searches running on a segment, but this does not affect merging.
• In the hot repository, the indexing queue is stored in the \Hot$\IndexingRoot$\storage.hot folder.
The hot repository has its own independent indexing mechanism. This index exists only as long as the events are still in the hot repository. As soon as they have been moved to the main repository, the index is deleted. The new repository file in the main repository is queued for indexing just like other files.
The indexing process is similar to the one used in the main repository, with the added complexity of working with batches rather than single files.
Important: Indexing of the hot repository is done by the same indexing server that works with the main repository. However, the hot repository index cannot be decoupled from the repository share and put in a different share (as the main repository index can).
Regular Operation
During problem-free continuous indexing of production repositories, the following periodic activity occurs:
· Every minute, the indexing server checks whether the repository share is accessible and its files can be enumerated. The check is done to make sure that any available repository files can be queued for indexing. This is true for both main repository indexing and hot repository indexing.
· As long as indexing is working properly, every hour, an informational message (event ID 13842) is written to the InTrust Server log stating that the index is up to date. Example: “Indexing of recent items for repository "Rep1" successfully completed; index is now up-to-date.” For details about possible errors, see the InTrust Server Events document.
· Every 7 days, the index is cleared of all data that has no matching repository files.
· Every 1 day, data from the hot repository is merged into the main repository.
How Searches Use the Index
When searching in indexed repositories, Repository Viewer tries both indexed and non-indexed searching at once. This is done because the most recent events must be displayed first, and there is no telling in advance whether they are indexed yet. This initial part of the search is the slowest, but after the most recent events have been found, searching picks up speed.
If there is a hot repository in addition to the main repository, Repository Viewer searches there also, both with and without the index. This pattern is similar to searching in the main repository, but the difference is that hot repository searching is deeper due to the increased nesting of data. In such a repository, Repository Viewer tries all the four areas during a search:
· Non-indexed search in hot repository
· Indexed search in hot repository
· Non-indexed search in main repository
· Indexed search in main repository