Managing the data deluge

How to analyse and tame ‘big data’.

February 1, 2011

As transactions multiply exponentially, storage requirements are strained, processing power is driven to new limits, and organisations are flooded with data.

The challenge of coping with the data deluge has forced its way out of the data centre and into the boardroom, says Michael de Andrade, MD of business intelligence (BI) and data management solutions provider EnterpriseWorx.

“It’s no accident that The Economist earlier this year published a special report on managing information,” he says. “Storing data has become a challenge, not to mention mining it for useful information and trends.”

American retail giant Wal-Mart, soon to enter the South African market, handles more than one million customer transactions an hour, feeding databases estimated at more than 2.5 petabytes (1 000 terabytes), while social networking website Facebook hosts more than 40 billion photos, according to The Economist. Google, handling half the world’s Internet searches, answers around 35 000 queries a second, the report states.

International Data Corp estimates that some 1 200 exabytes (1 000 petabytes) of data will be generated this year, giving rise to the term ‘big data’.

“The challenge facing our digital world,” says de Andrade, “is to find out ways of helping businesses to locate and penetrate information nuggets and present them when and where employees need them. Data mining – or business intelligence – has become ever more sophisticated in attempting to achieve this.

“The first step is to improve the accuracy of underlying information. Data integrity is an essential prerequisite for consolidating and integrating data from multiple sources. It’s the only way to ensure that the data is reliable and credible, and gives the company a holistic view of its operations, that is ‘a single version of the truth’.

“Taking bad data and put good-looking graphics and analytic capabilities on top does not make for good business intelligence. Companies need to be clear on their strategic objectives and implement analytical systems in a methodical way. “It’s important to define the long-term objectives of the organisation, decide which business drivers are critical for achieving these, and then derive key performance indicators (KPIs) for measuring progress.”

However, adds de Andrade, organisations must avoid making the mistake of adding more and more key performance areas (KPIs) for monitoring. Instead of being overwhelmed by the volume of data, they should review the key aspects of their business and analyse only the ones that really matter. “That’s the only way to avoid information overload,” he says. “The brain can only process so much. Looking at dozens of KPIs doesn’t help executives run the company. The business may have changed in the past few years, and some KPIs may no longer be relevant. They need to be swept away.

“For example, if a retail strategy is to increase market share, it’s necessary to monitor customer spend, so as to ascertain whether current customers start spending more money, or whether the firm is attracting new customers, or both. It may also be important to measure the company’s rate of growth compared to its competitors. These KPIs then form part of a business intelligence system that measures whether or not the growth strategy is being implemented effectively.

“However difficulties arise when it comes to making sense of unstructured information. Less than 20% of business information is held in a structured format, such as a database, where it can be relatively easily managed and analysed. It’s the 80% of unstructured data that provides the challenge. This data includes photos, e-mails, personal filing systems and information in cyberspace on the Internet.

“The latest analytical tools are able to integrate structured and unstructured data and display massive amounts of information visually in an accessible way. Displaying information visually makes it easier for the brain to grasp complex issues and relationships.

“This makes it possible for organisations to discover crucial relationships buried in huge, complex, dynamic information collections with hard to discern links and use it for forecasts almost in real time, giving them an edge on their competitors.”