May 5 ’11

Tackling New Challenges in Data Management on the Mainframe

by John Barnard in Mainframe Executive

Have you ever seen the YouTube videos of the soda and candy experiment, where someone drops several pieces of a certain mint-flavored candy into a bottle of diet carbonated soda? Suddenly, you see mass amounts of soda erupting and exploding in every direction. You might say that, over the past several years, data has exploded in a similar way in terms of volume, types, and content. In addition, there’s a greater demand to access data quickly.

But how do you manage the mass volumes of data that keep growing? The established methodologies for managing data and databases probably aren’t going to be effective in the future. Further, data plays a new role in today’s business, and the IT department needs to successfully manage to that new role by quickly and effectively creating information from the data. One example of turning data into information is an application that reconfigures data to effectively display it on a mobile device. Another is a business intelligence solution that mines and analyzes stored data. So how do you take that data in its raw form and process, arrange, and configure it so you can present it in a way that provides information or knowledge?

Five Data Management Challenges

With the skyrocketing volume of data, companies are facing major challenges to remain competitive and continue to meet customer demands. The following examines five such challenges and how you can tackle them: 

1. Extract, Transform, and Load (ETL): ETL goes hand-in-hand with application modernization. It’s all about companies providing services in a simplified way to their customers—modernizing legacy applications, the user interfaces, and the middleware that ties them all together from an enterprisewide perspective. But sometimes it isn’t enough to merely modernize the applications. Sometimes the data itself needs to be transformed into a more appropriate format and then reloaded for the updated application to use.

If your company requires ETL, you don’t need to build it yourself. A variety of ETL solutions exist, but be sure to select one that meets your unique business requirements. For example, if you’re considering moving some applications to the cloud, ETL is key to ensuring that move is effective. In many cases, you will be looking at large amounts of data that need to be extracted, transformed, and quickly reloaded into the cloud. So make sure the ETL vendor is able to support you in this regard. Moreover, performance is particularly critical for many applications in any virtualized or non-virtualized environment. If it takes an hour to extract, transform, and load a terabyte of data, this may be too long to meet your business requirements.

Look for an ETL solution that’s robust, efficient, and scalable. Even if you have only a small amount of data to manage today, keep in mind that the volume of data is continually growing. Be sure to select a solution that can meet your future needs.

2. Data integration: This involves allowing data from multiple sources, multiple data stores, and multiple types of databases to be translated into a common data structure or architecture. This makes it easier to write and access the data and turn it into information. Data doesn’t always exist in the same form.

For example, consider a scenario where you’ve just refinanced your mortgage with the financial institution where you hold a checking account. You may, in fact, receive a new offer to refinance the week after you sign off on your new mortgage. This could happen if this financial institution has separate databases for general customers and for mortgage customers. If the data were integrated, the company could save the time and effort of mailing mortgage refinance offers to customers who’ve just refinanced. Sending these offers out just after someone has refinanced can lower the bank’s credibility and confuse the customer. It can also cost money.

Data integrated just for the sake of integrating it isn’t very useful. Be sure to understand the business requirements for integrating data. Also, determine if the data is brought together logically into a federated database, pulled into a virtual database, or brought together physically. To a large extent, effective integration requires recognition of the key structures that can be used to bring the data together in an effective, useful fashion—and it’s absolutely essential to have an understanding of the data content. This content is critical to creating useful information.

You’ll also want to understand how often the data is actually needed and for what purposes it’s being used. For example, if you need to do an end-of-quarter analysis of the number of hits on the company Website, the number of customers acquired or the number of processes run, that’s quite different from needing data to understand an operational issue that occurred 20 minutes ago.

3. Master Data Management (MDM): MDM is a set of processes and procedures that allows companies to have a consistent view of data, such as customer addresses, throughout the enterprise. With effective data integration techniques, you should be able to ensure this consistency as well as even consider reducing the locations of a given piece of data.

To address this challenge, you need to know whether the data itself should be in multiple copies and in multiple places. Having too many copies of the data might not be so much an issue of the price and cost of storage, but rather the price and cost of administering the storage and the multiple copies.

There are two ways to implement an MDM strategy: The first is to bring the data together in one place, which requires a great deal of processing around applications and most likely application modernization. It may also require architectural changes. This is the most difficult and costly approach, but the end result is clean.

The other approach is to look at MDM as virtual or federated database management with a set of processes and applications for data so it can exist in multiple locations but is intact and synchronized across locations. In essence, you ensure the federated data and databases are consistent within themselves. With this approach, you recognize that the same information exists in multiple places and you don’t spend the money to convert applications, and so on. But you do make sure the processes for data are in place to verify the data is consistent throughout. This approach is less costly and the end result can be very effective as well.

4. Security and access: Although the mainframe is highly secure, the issue still remains that a user who has full access—such as a DBA or someone with super-user authority—has advertent or inadvertent access or can make changes to business-critical information or data that should be secure. Companies need to find a way to ensure that no unauthorized person has access to certain data.

Another area where you need to be concerned about data access is the public cloud. Many of the issues that deal with mobile devices and the Internet, and Web-enabled applications accessing back-end data have been solved. However, the game changes to a certain extent with public clouds because data security and information security aren’t always fully addressed by both the providers and the consumers of the public cloud.

The primary way companies are addressing the overall security/access issue is to use logs to discern which users are updating and/or accessing data. This can be an extensive, tedious process, but is less so if you formalize on some sort of data access management approach. With this approach, instead of reviewing log information out of databases, you should focus on understanding what data is being accessed and/or updated in real-time. It’s a less onerous approach, and solutions are available that enable this very granular data to be consolidated into one database. Leveraging these solutions enables compliance officers and security staff to understand what’s happening with the data. In addition, some solutions can enable you to prohibit updates or access to certain data, even from a secure employee or secure customer. In essence, this approach provides security down to the data component level vs. just at an application or database level.

In a way, controlling access is similar to audit management. For example, an audit manager can bring information together and maintain a historical archive of activities, and through data mining show that User X logs in at 2 a.m. once a month and performs action Y on a specific set of data. It may be that this process is perfectly valid and that 2 a.m. is a time when he likes to get his work done. On the other hand, this might be worth looking at and talking to the person to better understand what’s going on.

5. Change Data Capture (CDC): Disaster Recovery (DR) is a well-established issue in data management. What’s new is the idea of running what’s typically called active-active or hot-hot environments. With this approach, you replicate the data in real-time between what was previously referred to as the production and DR environments. Now, both environments are active and can be running production work on the same data, at the same time, from two different sites.

This new approach requires improved CDC to enable companies to take the changed portion of a database and quickly transfer it to the other active site, enabling almost concurrent processing by the other site. The two sites can actually share the workload processing.

With both systems being identical or having identical copies of the same data is in many ways better than the historical DR approaches. By definition, the data is in sync and intact, and all applications run in both places. In the event of a disaster, the second site can take over immediately. However, this approach also allows access at the workload level or workload processing to be split between the sites.

Of course, not every business process needs to be in an active-active environment, but this approach is very effective for your critical business processes. For some organizations, the most critical business process might be technical support. For others, it may be order processing on consumer retail Websites or split-second trading on the stock market. Once you determine your critical business processes, decide how often the data should be synchronized to meet your business requirements. It could be every two seconds, or every second. Then, be sure the bandwidth and processing are in place to make that happen.

Look Beyond the Data

The huge—and constantly growing—amount of data available is interesting, but applications are the key to tying data together to make it useful. It’s important to look at the requirements for data, data access, and data security; however, be sure you also understand the applications and transactions that access and/or update that data. By leveraging the technology that addresses these key issues, you can effectively tackle new challenges in data management while improving service delivery to your end users.