EMC Unveils Data Domain Global Deduplication Array
EMC Corporation, the world leader in information infrastructure solutions, recently announced the Data Domain Global Deduplication Array (GDA), the industry’s highest performance inline deduplication storage system for enterprise backup applications.
The GDA, based on a new multi-controller extension of the Data Domain architecture, offers inline global deduplication and a global namespace for all data stored in the dual controller system. With throughput up to 12.8 TB/hour, GDA establishes consistently high benchmarks across the spectrum of common data center backup metrics.
The GDA provides up to 14.2 PB of logical backup capacity, driving new levels of simplicity for data center backup consolidation across workloads as diverse as very large databases, VMware images, and unstructured data.
The Global Deduplication Array presents a single inline deduplication storage pool to the backup application across two EMC Data Domain DD880 controllers.
A large datacenter’s backup jobs are dynamically and transparently load balanced across the controllers, simplifying capacity management, performance management and backup administration.
Unlike most multi-controller deduplication systems, the inline Global Deduplication Array is tightly coupled with backup software, enabling industry leading inline deduplication performance, dynamic distribution of load and simplicity of operation.
The GDA distributes parts of the deduplication process to the backup servers to reduce network load and increase the throughput performance of the GDA controllers.
The GDA offers more than 3 times faster backup throughput per controller than competitive deduplication configurations and is by far the fastest inline deduplication system in the industry.
This distributed deduplication processing throughput is anchored by the native speed advantages of the multi-core CPUs in the GDA controllers and the Data Domain SISL (Stream-Informed Segment Layout) scaling architecture that minimises the number of disk accesses required in the deduplication process.
At initial release, the platform supports Symantec NetBackup and Backup Exec through backup server-based OpenStorage plug-in software. Later in 2010, it will also support EMC NetWorker using integrated software.
“Figuring out how to get backups done within the allotted period of time in the face of data growth is still the biggest data protection challenge that organizations face according to our research,” said Brian Babineau, senior consulting analyst with Enterprise Strategy Group.
“With their Data Domain Global Deduplication Array, EMC has far exceeded the inline deduplication performance benchmark it set with its previous top-of-line Data Domain system, but more importantly, the company has given customers a way to protect more of their data in a shorter period of time. We expect more companies to evaluate integration between backup software and deduplication storage to maximise these performance levels and data reduction results while consolidating administrative tasks.”
For backup environments with hundreds of terabytes to process, administrators can target their backup policies to a GDA and leverage a common deduplication storage environment for all data protected by those policies.
The GDA accommodates up to 270 concurrent backup jobs and up to 12.8 TB/hour of throughout allowing more backups to complete sooner while putting less pressure on limited backup windows. The GDA global namespace minimises the need to reconfigure complex backup policies and innovative global deduplication technology dynamically load balances policies for performance and capacity management. Consequently, very large data sets can be easily protected with administrative simplicity while maximising overall deduplication efficiency and therefore minimising physical storage footprint.
With the EMC Data Domain Replicator software option, the GDA can automate WAN vaulting for use in disaster recovery (DR), remote office backup, or multi-site tape consolidation. A single GDA can support a replication fan-in of up to 270 remote offices using smaller deduplication storage systems such as the Data Domain DD140 or the DD600 series appliances.
Cross-site deduplication further minimises the required bandwidth since only the first instance of data is transferred across any of the WAN segments between sites. Additionally, for fast offsite protection and consolidation of tape out operations, the GDA provides up to 54 TB/hour of replication throughput.
Like all Data Domain systems, the new GDA is simple to install and flexible enough to be implemented into existing user environments without disruption. Backed by available EMC 24x7x365 enterprise class service, the GDA seamlessly integrates into Symantec NetBackup and Backup Exec backup environments using the EMC Data Domain OpenStorage software option.
“The EMC Data Domain Global Deduplication Array, while very sophisticated under the hood, builds on the mature foundation of the existing Data Domain platform and retains its appliance simplicity,” said Servaas Venter, Acting Country Manager of EMC Southern Africa.
“Its deduplication is in line, it’s blistering fast, and it’s big enough for significant datacenter backup consolidation, but its dynamic load balancing, single deduplication storage pool and namespace and tight integration with backup software means the GDA is easier to operate than competitors who don’t have its scale. EMC has once again moved the dial on disk-based data protection.”
The GDA is based on the same CPU-centric approach to inline data deduplication as all EMC Data Domain systems. Unlike most deduplication approaches that are added as afterthoughts to existing disk arrays, Virtual Tape Libraries (VTLs) or backup software, combined efficiencies of Data Domain include:
• SISL scaling architecture leverages CPU improvements to increase deduplication speed inline while minimizing reliance on disk accesses for performance. Data Domain systems have delivered consistent improvement in throughput performance by nearly 90x and in capacity by more than 225x over the last 6 years. Based on Intel’s CPU roadmap, increased throughput is expected to continue growing significantly in the future.
• High performance inline deduplication for simplicity, to minimize system resources, administration, and internal system process contention.
• Green storage efficiency for a smaller system footprint and lower power consumption.
• Data Domain Data Invulnerability Architecture defends against data integrity issues by providing continuous verification during storage and recovery of data.
The EMC Data Domain Global Duplication Array will be generally available in the 2nd quarter of 2010.