Jun 1 ’04

Data Security Adds an Extra Layer of Protection

by Editor in z/Journal

Compressing data is nothing new for the data center. Making files smaller is one of the ways IT managers ease the transmission and storage of the vast amounts of data that reside in data centers worldwide. Storage Management Subsystem (SMS) data set compression, writing data to tape, and other hardware-based compression technologies have been around for some time on the S/390, but these methods often are insufficient. They leave the IT shop wanting the capability to easily move data off the host onto other computing platforms to serve the needs of various business units.

A common method of transferring data is over networks. However, with electronic distribution comes additional security risks not necessarily found when transferring data on physical media. There should be no compromises in security when it comes to your corporate data.

ZIP, the file compression format created in 1986, has become a staple for data centers and other IT shops, with more than 20 percent of all mainframes worldwide using a ZIP solution. Compressing data into ZIP archives provides interoperability between platforms and considerable savings in DASD and bandwidth. Recently, strong security has also been added to the ZIP format.

Compression 101

The purpose of data compression is simply to make files smaller. Compression reduces file sizes by eliminating redundant patterns and encoding the contents of the file, using symbols that require less storage space than was originally required. After a file is compressed, its content is changed to an encoded form and the file cannot be used until it’s decompressed. The decompression process is the inverse of compression. It restores a file to its original state.

As a simple example of the data compression process, consider this sentence:

she sells sea shells by the sea shore

This sentence consists of 37 characters, including spaces. The spaces are important and can’t be simply thrown away, since removal would change the meaning of the original message.

The science of compression recognizes the repeating patterns in this sentence. The combination “se” appears three times, “sh” three times, and “lls” twice. In fact, the “se” pairs all have a space in front of them, and can be represented by “ se.” These patterns define the redundancy of the message.

Each of these patterns can be encoded by replacing them with a single character. For example:

The first replacement string includes a space at the beginning. The new form of this sentence, using these symbols, looks like this:

$e#%#a $e% by the#a $ore

The new representation is 24 characters long. This is a saving of 13 characters, or 36 percent. Applying the decompression process to the compressed string would result in the replacement symbols being converted back into their original form, restoring the original message.

How Does It Work?

Compression is accomplished using algorithms developed to achieve the best compression results for a given type of file. Algorithms are computer programs written to complete the steps needed to compress a file. Most compression algorithms operate in a more complex manner than the previous example, but all operate by the same principle of replacing repeating patterns with efficient encoding methods.

Different file types will typically compress to different sizes. The amount of compression that may be gained for any file, regardless of type, depends on how much redundant information it contains. Files with a high level of redundant information will compress more than files with low redundancy. Binary data compresses well, text data compresses better, and databases compress the best.

There are two main types of compression: lossy and lossless.

Lossy compression assumes that it’s acceptable for the compression process to discard some of the original data to achieve more efficient compression. This means that, after a compressed file is decompressed, it’s not an exact copy of the original file since some of the data was “lost” by the compression process. This assumption is valid for some types of data such as images. For example, when a file representing an image is compressed with lossy compression, the compression process may actually discard some of the image data to achieve greater compression. Typically, the human eye cannot detect the differences between the image generated from the original file and the image generated from the compressed file after decompression. Where a loss of data through compression is acceptable, compression rates of greater than 90 percent are common.

Lossless compression takes the opposite approach. In lossless compression, it’s unacceptable to discard any data. The decompressed form of a file must exactly match the original file content. Lossless compression is used when it’s necessary to faithfully reproduce the contents of a file through decompression. Files containing words or numbers and files that are intended for further computer processing may require lossless compression. In these situations, it would not be acceptable for any of the content to be discarded or altered by the compression process. Figure 1 shows compression ratios for a sample data file.

The benefits of data compression aren’t limited to a specific operating system. Although the data used in the analysis above is based primarily on common desktop file formats, compression is available on larger enterprise platforms such as mainframes, AS/400, Unix, and Linux systems. Similar compression results to those shown in Figure 2 can be achieved on these platforms.

While lossy data compression may work for multimedia-type files, using this technique can alter the meaning of data, which makes it inappropriate for strategic data.

Preparing for Battle

As businesses extend their operations and data beyond traditional corporate and IT boundaries, the need to securely compress data becomes more compelling.

IT, security and data center managers must now determine the safety of their data:

Security officers faced with protecting digital assets have woven together a fabric of systems and technologies to help manage the risk and protect their corporate IT operations. The problem is too big and amorphous for a one-size-fits-all solution.

Data Security

Firewalls act as a protective perimeter to your IT assets, but they’re only one part of the security fabric. What happens when someone has breached your security wall? If you’re fortunate, there’s a virtual guard dog waiting to stop further access. Adding RACF, ACF2, or even Top Secret to the data processing environment is the first layer of protection for your mainframe. This is great as long as you don’t plan to move the data to other environments.

IT organizations spend tremendous time and effort protecting the network, configuring firewalls, isolating critical back-end processes, and keeping their servers running. However, the inevitable still happens. You probably know a company or someone who has fallen victim to a malicious attack, costing that company millions of dollars in production downtime. So, how can one prevent such an invasive, costly event? It’s a challenge, given the disparity of the platforms commonly deployed and the growing need to network with partners and ensure their systems use secure, effective technology.

The challenge is greater and more imminent for companies that fall under strict regulatory guidelines such as:

Encryption

Data centers can save money by combining encryption and compression. Encryption usually increases the file size, making pure encryption security solutions infrastructure-intensive. But combining the two techniques yields significant file size reductions. Figure 3 shows an example of a document encrypted with a typical industrial-strength algorithm.

When using encryption alone, the resulting file sizes are typically significantly larger than their original size. If the same files are compressed before being encrypted, each file is significantly smaller than when encryption is used alone. In the same data file example shown in Figure 4, each file is more than 50 percent smaller than its original size.

For the typical IT organization, the task of transferring large files over the network is usually scheduled at night, to minimize the disruption in network performance for the organization’s users. After compressing these files, sometimes by as much as 90 percent or more, the transfer requires less dedicated bandwidth, which often translates directly into cost savings, but also means more flexibility for the IT organization to schedule large file transfers.

Evaluating Solutions

Here are some best practices IT professionals can follow in selecting a ZIPbased data security solution:

The security scrap heap is full of solutions that didn’t work. ZIP compression, already in use in virtually all data centers today, provides a robust platform for delivering strong security that is reliable, deployable, and usable. The traditional ZIP standard includes password-based security. Several companies have extended this further to include strong security, including the government Advanced Encryption Standard (AES). In using ZIP to secure and compress data in storage, data centers can reduce the security risk and pack more data into the same media. Deploying a ZIP implementation requires minimal infrastructure, training and support. By adding this layer of data security to the security mix, your organization can count on further protection for your most important enterprise asset: data.