The negative performance impact of running more deduplicators. (4039294)

Chat now with support

Get Live Help
Complete Registration

Sign In

Request Pricing

Contact Sales

Return

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Title

The negative performance impact of running more deduplicators.
Description

There is a temptation to push up the number of concurrently running deduplicators on a SmartDisk node to speed up the rate of deduplication. This solution will explain why this can be a false economy.
Cause

To understand this scenario it is necessary to explain the deduplication process. Data is stored in the Chunk Store in logical units called pages. The stream is split into chunks by the deduplicator and a query against the Chunk Index determines if the chunk is already in a page in the Chunk Store. If the chunk is not represented, it is added to the currently open page which will be added to the Chunk Store and a new Chunk Index entry is created. The chunk is then listed in the Manifest for this stream and processing is complete for that chunk.
Running these in parallel is fine, but only one page is open at a time, so multiple deduplicators finding new chunks will insert chunks into the page in turn, fragmenting the data for the streams. Therefore more concurrent deduplicators increase the potential for fragmentation of the data.

The result of this fragmentation will be slower retrieval (restores) of the stream from the Chunk Store and longer run times for Garbage Collection as retirements will cause many small holes spread widely across the Chunk Store as opposed to being grouped into a smaller number of pages.
Resolution

In conclusion, if there is pressure on the system and there is not enough time to deduplicate and Garbage Collect in the week, then there is probably nothing to be gained by running more deduplicators because the time it will take to Garbage Collect the resulting data will use up the time saved and probably more. The solution is to reduce the demand on deduplication and Garbage Collection by reviewing what is being deduplicated. Incremental backups are unlikely to already be in the Chunk Store and CBT incrementals may contain block references which will ensure that they are unique and not suitable for deduplication. Any compressed data (including Audio and Video) are unsuitable for deduplication. It may be necessary to move some of the workload to another SmartDisk node.

Feedback Submitted

Did this article solve an issue for you?

Select Rating

Request a KB Article

Please select your product:

To serve you better, please complete the Purpose of your Chat:

Recommended Solutions for Your Problem

The negative performance impact of running more deduplicators. (4039294)

Title

Description

Cause

Resolution

Leave a Comment