In the era of all-flash storage era, innovations in software and hardware technologies are providing the much needed support for enterprise data centers in terms of high efficiency and intensified IT enablement. What cannot be overlooked in all the excitement is the fact that high-performance flash storage carries the core service systems for industry customers of all shapes and sizes. If a problem were to occur, the organisation would be hit hard.
Qualix Group once released numbers showing the impact from service interruptions on various industries. In transportation, a one-hour stoppage would produce 150,000 USD in loss on average while that for banks would increase to 270,000 USD. The same one-hour stoppage for a telecommunications company would cost 350,000 USD, manufacturing would be hit with a 420,000 dollar loss, and that for securities traders would top the list at 450,000 USD.
Considering the huge loss when things go wrong, ensuring service continuity is a top priority for the mission-critical services carried on all-flash storage systems. To ensure service continuity, one storage vendor after the other has launched their own active-active solutions, but the architectures vary in terms of capabilities. Huawei’s next-generation all-flash OceanStor Dorado V3 with HyperMetro safeguards reliability of data and keeps latency down to a predictable 0.5 ms, amongst the lowest in the industry. The key technologies going into the HyperMetro feature are covered in the following sections.
Gateway-free active-active architecture
The most important factor in determining the speed of an active-active layout is the gateway between the arrays. Replacing the physical gateway with a virtual one like XtremIO active-active in EMC VPLEX and FlashSystem running on SVC from IBM minimises end-to-end latency, but also adds to networking complexity. At the same time, adding an external gateway will add nodes to the mix, and increase the costs of procurement and management.
In active-active architectures with gateways, host read/write IOs must pass through a virtualised gateway for processing, prolonging the I/O processing path and lengthening latency. Tests have shown that virtualised gateways in OLTP service profiles add anywhere from 1 to 1.6 ms in latency.
Huawei OceanStor Dorado V3 adopts a gateway-free active-active architecture, removing the gateways on both sides. This immediately helps the customer reduce procurement costs and reduces the possible points of failure. The result is reduced latency, improved reliability, and accelerated performance. Adding to attraction, networking is also much simpler. The number of deployment steps is cut in half, thereby shortening the delivery cycle.
Active-active architecture
HyperMetro is deployed on two arrays in an active-active profile. Data on the active-active LUNs at both ends is synchronised in real time, and both ends process read and write I/Os from application servers to provide the servers with parallel active-active access. Should either array encounter a fault, services are seamlessly switched to the other end without interrupting service access.
In contrast to the active-passive solutions being promoted by other vendors, Huawei HyperMetro fully utilises computing resources with its load balancing capabilities while effectively reducing inter-array communication. The significantly shorter I/O paths yield higher access performance and quicker failover. Data from lab testing shows that in OLTP scenarios with a 7:3 read/write ratio, performance in active-active mode is 20% higher than that in active-passive mode.
A-A architecture further improves the overall reliability of the solution and its ability to tolerate faults. For example, if the link between the host and an active array goes down, the A-A architecture automatically switches the host over to the other array. This situation would lead to service interruption in the A-P architecture.
FastWrite IO — Dual-Write process optimisation
Since real-time synchronisation is needed in active-active layouts, long distance between the centers leads to longer latency in cross-site interaction, which in turn becomes the performance bottleneck. HyperMetro employs a FastWrite function to optimise the data transfer protocol between storage arrays. With SCSI’s First Burst Enabled function, the number of interactions for write data on links are cut in half during transit.
Optimised Access Routing
In active-active data service cases, the distance between the two sites is the key determinate to I/O access performance.Working with Huawei UltraPath multipathing software, HyperMetro provides two I/O access policies for customers to choose from (based on the distance between sites): load balancing mode and preferred array mode.
Load balancing mode is mainly used in scenarios where the active-active services are deployed in the same data center. In this scenario, host traffic loads can be balanced across the two storage devices, maximising resource utilisation and improving overall access performance.
Preferred array mode is mainly used in scenarios in which active-active services are deployed with DCs at considerable distance. In these scenarios, cross-site access to the active-active DCs is costly. If the link distance between the DCs is 100 km, a round-trip typically takes approximately 1.3 ms. Preferred array mode reduces cross-site interactions, improving I/O performance.
In data read scenarios, a service host only needs to read data from the active-active storage array of the local DC, which avoids cross-DC data reads and improves overall access performance.
Tandem injections from HyperMetro + FlashLink
FlashLink technology works with the HyperMetro feature to deliver precise CPU prioritisation and I/O scheduling, enabling mission-critical services to be processed in their own zone. Other tasks cannot congest the zone and each resource has its own highest priority settings in terms of host read/write requests handled in the system. This prioritisation helps ensure the active-active service latency remains predictable and consistent even as loads increase. In addition, after large blocks are aggregated by the ROW mechanism, the data is written to SSDs sequentially. Disk/controller collaboration and intelligent recognition functions identify hot and cold user data and metadata, thereby reducing write amplification, extending the disk service life, and improving the overall write performance of the storage system.
HyperMetro can work with inline deduplication, compression, ROW snapshot, and other features all at the same time. These features involve a large amount of metadata. FlashLink technology introduces improved metadata caching mechanisms to ensure frequently accessed metadata can be directly queried from the memory, resulting in lower TCO for users. The wider range of disaster recovery functions deliver better reliability while ensuring stable performance.
The series of reliability and performance optimisation designs in Huawei HyperMetro contribute to the lightning-fast performance and rock-solid stability of the OceanStor Dorado V3 – the optimal choice in the all-flash era with its impressive throttling in carrying mission-critical services.