Deduplication ratios are calculated by many environmental factors, however, retention is a key factor in this formula as retention is what gives the deduplication engine the historical data it needs to compare the unique vs. similar data.
If you are keeping only 2 weeks of data, and you have heavy changes in your data, your deduplication will be poor. Additionally, even if the data does not change as heavily, you are still not maximizing your deduplication potential by using a short retention period.
The number of weeks of retention you keep heavily impacts your ratios. This is because the longer the retention, the more the deduplication system is seeing the same data. Which is why when the deduplication ratio increases as the retention increases. Many vendors will claim that they get a deduplication ratio of 15-20:1, but when you do the calculations, the retention to acheive this comes in at about 16 weeks. If you keep only two weeks of retention, you may only get about a 4:1 ratio.
Here is an example to project this:
If you have 10TB of data and you keep four weeks of retention, then without deduplication you would store about 40TB of data. With deduplication, assuming a 2% weekly change rate, you would store about 5.6TB of data, so the deduplication ratio is about 7.1:1 (40TB ÷ 5.6TB = 7.1:1). However, if you have 10TB of data, and you keep 16 weeks of retention, then without deduplication you would store about 160TB of data (10TB x 16 weeks). With deduplication, assuming a 2% weekly change rate, you would store about 8TB of data, which is a deduplication ratio of 20:1 (160TB ÷ 8TB = 20:1).
If you can afford the space, consider increasing your retention period. You will see a better deduplication ratio, but also, end up saving more space thank keeping a shorter window. You won't see the change overnight, or a fortnight, but you will see it over the course of several weeks. You should always be sure however, before purchasing a deduplication appliance and storage, that everything is factored in. You do not want to undercommit on storage, or purchase just enough because you were told you would have a "best case scenario" deduplication ratio.