Data Redundancy and Deduplication Making Your Data More Efficient
What is data redundancy?
Data redundancy refers to having multiple copies of the same data stored in different places across a network or repeated within a single database. When an organisation has the same data repeated in this way, costs for storage can be higher, the data itself can be inconsistent, and restoring data from backups can be slower.
A simple example of data redundancy is as follows: an email server could contain 100 instances of the very same 1MB email attachment, because 100 staff have backed up their inboxes. That’s 100MB of storage space for exactly the same file, so 99MB of that backed-up data is taking up space unnecessarily and is therefore redundant.
Is data redundancy different to replication?
Yes, it is. Data replication is the deliberate copying of data and storing copies in different locations. Replicating data across systems enables users to access and share that data – for example, cloud services use replication so that users can access their files from almost any device with an internet connection. Data redundancy is often accidental or unintentional, and can be detrimental (though not always) to the organisation and its networks.
How does data redundancy happen?
Although it’s frequently accidental, when you’re working in a busy organisation, it’s quite easy to end up with redundant data. Customer-facing businesses often capture the same data more than once – for example, customer names and personal details may be stored in different locations by both the sales and marketing departments. Another cause of data redundancy is when multiple staff back up data without being aware that other staff are backing up the same data – or when previous backups are saved rather than overwritten. Sometimes, data redundancy arises from errors like poor coding or issues with the data management system itself.
Are there any benefits to data redundancy?
Data redundancy isn’t always a bad thing – there are benefits to having multiple copies of the same data available. There is simply a limit to how much data redundancy is useful. Three advantages of limited data redundancy are:
- Increased availability. Having data stored in more than one location can make it quicker and easier for multiple users to access it
- Checking accuracy of data. If you have more than one entry for a customer, for example, you can cross-reference their details between entries and make sure data is correct and complete
- Minimised downtime in the event of an incident. If only one data storage location is compromised, having the same data in a different location means you don’t have to wait to restore from a backup to continue operating
What is deduplication and how does it help with data redundancy?
Deduplication is the process of identifying multiple instances of the same data, and deleting all but one, to reduce demand on storage capacity and retrieval capabilities. All the redundant duplicates are replaced with a simple reference to one instance of the original data called a ‘pointer’.
In our example of the email server containing 100 instances of the same 1MB attachment, data deduplication would free up 99MB, making it an extremely worthwhile process. In short, the more redundant data you have on your servers and on cloud systems, the greater the efficiencies you can make with deduplication.
Deduplication is a popular alternative to compressing files, but although both methods aim to save space, they work differently. Deduplication eliminates unnecessary repetition of data, whereas compression reduces the size of a file while retaining the original data.
Why is deduplication important?
Dealing with data redundancy through deduplication helps you use storage and bandwidth far more efficiently, and ultimately requires less storage space for the same amount of data. This can cut expenditure on physical storage (and the power needed to run and cool it) and cloud storage, even as the volume of data you process increases.
Deduplication is also important to Disaster Recovery (DR) and Business Continuity (BC), speeding up the recovery process as there is less data to restore. Using the 1MB email attachment example again, in the event of an incident, everyone who originally received the attachment would have it back faster, as deduplication means the backup only requires one ‘master’ copy.
Local deduplication vs global deduplication
What we’ve described so far is ‘local deduplication’ – removing redundant data from a single device or system before the data is backed up.
Global deduplication works across multiple devices or systems and is commonly used in large-scale environments, e.g. datacentres or cloud storage services. At each stage of deduplication, more redundant data is stripped out, which keeps the space required to a minimum. However, it does mean recovering data requires going back through each stage of deduplication, restoring the ‘redundant’ data as it goes.
Local deduplication, on the other hand, evaluates data redundancy per device before the data is backed up. So while global deduplication works for all devices that held the original data, local deduplication works according to each specific device’s data redundancy and therefore its own deduplication.
Because it works off a single deduplication index, global deduplication often has a better reduction rate. However, local deduplication can result in better performance because the data is easier to access. Which method is used will depend on the volume of data a business generates, as well as the nature of that data and how time-critical it is. Data governance and compliance regulation may also be a factor in the type of deduplication used to remove redundant data from your organisation’s networks.
Deduplication from BackupVault
When choosing a cloud backup provider, it’s a good idea to check they offer a suitable deduplication option. With BackupVault, deduplication happens automatically.
As well as deduplication, we secure your data both during transfer and at rest with the highest-grade encryption. Backups are automated, our UK datacentres are fully secure, and your data can be restored in as little as three clicks. Get in touch today to find out how we can protect your business-critical data.
BackupVault: what have you got to lose?