Cloud and Datacenter Technical Consulting: May 2010

Data Reduction Technology
Data size is increasing! Corporates struggles to maintain the costs and also investing huge in backup and DR solutions to protect the critical data and for high availability of data. Major storage companies new offering related data reduction technologies will help in shrinking the data size which will help in performance, data integrity, eliminating redundant data, reduce data protection costs, improve the utilization of storage, faster remote backups, replication, and disaster recovery.
There are a number of technologies that fall under the classification of data reduction or deduplication techniques.
NetApp & EMC provides data reduction technology.
NetApp Inc – Deduplication works at block level is the most prominent of the offerings taking aim at primary storage.
EMC - Celerra Data Deduplication, which actually performs compression before tackling deduplication on file-based data.
The following table shows four major data reduction technologies along with the space they can be expected to save when applied to a “file server or nas data set.
Technology
"Typical" Space Savings
Resource Footprint
File-level deduplication
10%
Low
Fixed block deduplication
20%
High
Variable Block Deduplication
28%
High
Compression
40% - 50%
Medium
File-level deduplication, also known as file single instancing, provides relatively modest space savings but is also relatively lightweight in terms of the CPU and memory resources required to implement it. Fixed-block deduplication provides better space savings but is far more resource-intensive due to the processing power required to calculate hashes for each block of data and the memory required to hold the indices used to determine if a given hash has been seen before. Variable-block deduplication provides slightly better space savings than fixed block deduplication but the difference is not significant when applied to file system data. Variable block deduplication is much effective to data sets that contain misaligned data, such as backup data in backup-to-disk or VTL environments. The resource footprint of variable block deduplication is not dissimilar to fixed block deduplication. It requires similar amounts of memory and slightly more processing power. Compression is often considered to be different from deduplication. However, compression can be described as infinitely variable, bit-level, intra-object deduplication. Technical pedantry aside it is simply another technique that alters the way in which data is stored to improve the efficiency with which it is stored. In fact it offers by far the greatest space savings of all the techniques listed for typical NAS data, and is relatively modest in terms of its resource footprint. It is relatively computer-intensive but requires very little memory.
Technological Classification
The practical benefits of this technology depend upon various factors like –
Application’s Point – Source Vs Target
Time of Application – Inline vs Post-Process
Granularity – File vs Sub-File level
Algorithm – Fixed size blocks Vs Variable length data segments

Cloud and Datacenter Technical Consulting

Tuesday, May 25, 2010

Data Reduction Technology

Followers