General7.08.2012

Big data – do you really need it in your production database?

By Jaroslav Cerny, CEO at RDB Consulting

Big data is all the buzz, as organisations scramble to leverage their mountains of data to drive business insight. The question that begs, however, is how much data does a business actually need? Organisations sit with terabytes of data, many several years old, and all of it kept in the production database for instant access. The reality is that this is often not required for everyday business operations. While governance and regulations may require that certain data is kept for legal purposes, it is not necessary to store this data in expensive, ‘instant access’ databases. Historical data can be archived, saving money and time and helping organisations to use the right big data to make better business decisions.

As content generation continues to explode, data storage strategies have become increasingly important for businesses. Effectively managing this data should be a top priority, for greater cost effectiveness and efficiency. When it comes to big data, not only is it not cost effective, it is also impractical to store all data up front in the production database, and can in fact decrease everyday server performance. Organisations need to have a strategy in place to reduce storage costs in the face of exponential data growth, optimise performance according to the needs of the business and mitigate the risk of lost data and information.

The production database should contain only the current data that is needed for the day-to-day business and operations of the organisation. This database should feature high performance capabilities to deliver this data to users quickly and efficiently. However, if it is being used to store data that is not needed for everyday use, and becomes ‘heavy’ or bogged down with data, performance will inevitably be compromised.

Data cleansing and consolidation can assist with reducing data volumes in the production database, but this is often not enough to deliver the required performance gains. Strategy needs to be put into place to ensure that data is archived, removed from production and stored in more cost effective options. This strategy, however, must be linked into the business and its needs, including its daily operations. If data storage, retention and archiving strategies are not in line with the needs of the business, users will not be able to access the data they need when they need it and as fast as they need to. It is vital to first understand the needs of business and then put rules into place around archiving. This means that archiving is not simply an IT decision, but a business decision as well, and database administrators need to understand the business in order to provide advice for a better strategy.

With data maintenance plans and archiving strategies in place, data can be moved out of the production environment onto archive servers, which will still enable the data to be easily accessed by users but will not affect the performance of the production server. This will make searching faster and increase performance when accessing or creating data. Historical data will take longer to access, but since this is not needed as often the performance gains on daily data outweigh the minor inconvenience.

Partitioning data in this way will bring down the costs of hardware, software and licensing as well, saving organisations money. Production databases must deliver high performance, which means higher cost. Database size, server memory or CPUs and licensing are interlinked. Those companies who have historical data residing on the production server will need to spend more on ensuring high performance and licensing that is based on CPU’s. Archive servers do not need to provide the same levels of performance, so lower cost and lower specified servers can be used for this purpose. This approach also means that maintenance on the production environment is easier, rebuilding indexes is quicker and backups will run faster. In general, performance and uptime will be maximised. The archive server can also be used as a quality control environment to validate data integrity in a safer manner, since doing this on the production server can have a negative impact on business performance.

Ultimately the rules of data storage strategy are simple. The production database should contain only the data needed for day-to-day operations, and all historical data should be moved onto an archive server. This will allow for the production server to be streamlined and deliver the best possible performance, and will optimise the cost of maintenance and running of storage. This in turn will allow organisations to deal with big data in a more intelligent fashion, comply with regulations around data retention, and make more agile decisions based on current data thanks to optimised system performance.

Sign up to the MyBroadband newsletter