Black Friday occurs globally on Friday 23 November and is widely considered the busiest shopping day of the year. As retailers beef up on security to control massive in-store crowds, online retailers are enhancing their systems to prevent infamous website crashes.

Each year, millions of people are left frustrated when their favourite e-commerce websites let them down. Yet according to BBD CIO Tony van der Linden, there is no way for retailers to completely prevent system failures. “While most retailers get it right most of the time and can handle client requests in a performant manner under normal trading conditions; Black Friday is in no way normal trading conditions, as the sheer number of users stresses not just the infrastructure, but the website that delivers the service. At this point, the conversation moves away from performance to one of scalability.”

Scalability can be achieved either vertically or horizontally. Vertical scalability refers to adding more resources to a physical unit, such as more memory to a computer to create capacity. Horizontal scalability refers to adding more resources to logical units, such as an additional server to a cluster of servers to share the required tasks between them. The problem that arises scaling like this is how to effectively utilise the additional resources. Conversely, performance bottlenecks will have a significant and unpredictable impact on scalability – at worst resulting in one non-performant system instance affecting all other instances.

Using layman’s terms, performance versus scalability can be translated into a simple analogy. Performance as a metric would be if you were the only car on a single-lane highway and you could drive at top speed at any given time. But in order to add more cars to the equation, the infrastructure would have to be improved by widening the highway and adding more lanes for multiple cars to also perform at the same pace. This is scalability.

There are multiple factors contributing to successful scalability, including service providers being able to handle the load, hardware you use to run the application and the way the actual application is written. “Every point is a potential point of failure,” van der Linden adds. “By trying to prevent these failures, retailers are actively implementing microservices architecture. And while this can improve scalability, it does not limit system failures.”

It’s important to remember that every system has a breaking point. Online retailers need to decide if their breaking point is acceptable for them or not. “How retailers handle failure and recover from the breakage is more important than attempting to prevent failure entirely. You can test as many times as you want, but you won’t really know until you’re out in the wild what’s going to happen.”

While retailers can’t prevent a Black Friday blackout, they can reduce the damage it causes. According to van der Linden, system crashes aren’t only a tech problem but a business problem as well. “Chinese multi-national conglomerate, Alibaba, is an excellent example of the ability to deliver service while still handling massive loads. With over 500 million users a month, Alibaba has the correct tech and business processes in place to deal with high traffic surges. But even such a gigantic retailer is not immune to failure.”

Unfortunately there is no quick fix for the Black Friday masses. “There’s no silver bullet,” van der Linden adds. “System outages are frustrating for both retailers and users, but the best way to handle recovery is to record as much as possible the first time a system fails and improve systematically.”