Apr 30

Data Lineage—The Lifecycle Of Your Data

by Denise P. Kalm 

Another day, another data breach. And even as IT experts, how many of us feel confident that we know who has our data? And which data? The revelation that your Facebook information was a factor in a number of elections has to make you wonder and perhaps, worry. Even if we aren’t guilty of “over-sharing,” you can discover that many sites change their privacy settings frequently, thus putting the onus on you to constantly monitoring the settings on all the sites you go to. Who has time?

The Impact at Work

While we have a personal stake in data security—our own data is out there—we also have a professional responsibility to ensure our customers’ data is secure. This starts with understanding “data lineage.” Simply stated, it is where your data started, where it moved to, and where does it end up. It’s the “lifecycle” of a piece of data.

As mainframers, we know that the major applications at our companies are using data housed on mainframe databases. We may not have been told about it, but we see the results on our reports. (Distributed folks never told us this was happening, which made it a lot harder to manage). But this is the reality; and every IT worker must be aware of their role in keeping customer data secure no matter where it goes.

Why is this a Problem Now?

With speed comes the option to do more work, faster. With additional capacity (multi-terabyte drives) comes the option to store more data cheaply. The combination means that more companies are mining incredible volumes of data to better understand their business and their customers. More data simply means more exposure, and the more people who can look at it, the higher the risk of a data breach that will impact you and your customers. Since we’re often customers of the companies we work at, we have an additional incentive to look at what is actually going on with our data.

What Data Matters?

I remember a time when some companies were terrified of sharing their SMF/RMF data with a vendor, even when this could produce helpful insights into systems performance. Most of us know that there isn’t a lot of information that could help someone get a competitive edge in that data, nor is there any customer data. But think of this …You have thousands of databases and files. How many of them contain customer’s addresses? Social security numbers? Bank account information?

Few companies have a list of the sensitive data fields they collect and without this, you can’t even begin to figure out where this data might be. Remember the Y2K problem. We had a terrible time finding all the places "data" was referenced. This is a far bigger job, and yet, it is essential for business survival to get a handle on it.

If you ask some IT people where the data is, they’ll say on a drive or in a tape silo. But that’s only the beginning (or end) of data lineage. Data now flows out to customers, business partners and sometimes, the government. You have to start understanding what data is sensitive before you can start looking for it. That will depend on your industry, but you want to err on the side of over-protecting rather than skipping something that will hurt you down the road.

The Cost of Ignoring the Issue

But what is the cost? First, you will lose customers, especially if you let a lot of time elapse between the breach and the communication. Second, you will lose trust which costs you people who are considering becoming customers and also business partners who would be reluctant to share data with you if you can’t protect it.

The government gets into it. There are local, state, federal and even international laws governing this. Compliance and security people need to keep up on these regulations and the reporting requirements that go with them. Failure to comply can result in fines and other penalties.

What Do You Need?

After you have your list of fields, you need to discover the lineage of each of the data items, looking at all the ways data can flow from the various applications. Depending on characteristics of the data, there may be multiple destinations for a given field. It’s complicated and time-consuming to do this manually; you need automation.

You’ll need a solution that will help you to perform a discovery of data sources, application flow and other interconnections then establish a repository of this data to produce quick visualizations as needed. This kind of tool will let you see dynamically where and how the sensitive data is processed, so you can secure the data as appropriate. The cost of encryption is coming down, so it’s a good way to go with the most sensitive fields but you still want to be selective.

Additional Benefits

When you look for the data lineage solution you need, you want to look beyond security and compliance and get one that provides a variety of benefits. Being able to visualize all your business processes can be incredibly valuable when you need to do maintenance. How else can you map from the business rule your analyst or business partner specifies to the underlying code? A good way to map and visualize these relations can bridge the communications gap between you and the business, as well as providing real-time documentation of your systems. This can really empower DevOps as well. 

Find a great solution and make a giant step forward on protecting your customers’ (and your own) data.