Jun 1 ’04
Data Security Adds an Extra Layer of Protection
Compressing data is nothing new for the data center. Making files smaller is one of the ways IT managers ease the transmission and storage of the vast amounts of data that reside in data centers worldwide. Storage Management Subsystem (SMS) data set compression, writing data to tape, and other hardware-based compression technologies have been around for some time on the S/390, but these methods often are insufficient. They leave the IT shop wanting the capability to easily move data off the host onto other computing platforms to serve the needs of various business units.
A common method of transferring data is over networks. However, with electronic distribution comes additional security risks not necessarily found when transferring data on physical media. There should be no compromises in security when it comes to your corporate data.
ZIP, the file compression format created in 1986, has become a staple for data centers and other IT shops, with more than 20 percent of all mainframes worldwide using a ZIP solution. Compressing data into ZIP archives provides interoperability between platforms and considerable savings in DASD and bandwidth. Recently, strong security has also been added to the ZIP format.
The purpose of data compression is simply to make files smaller. Compression reduces file sizes by eliminating redundant patterns and encoding the contents of the file, using symbols that require less storage space than was originally required. After a file is compressed, its content is changed to an encoded form and the file cannot be used until it’s decompressed. The decompression process is the inverse of compression. It restores a file to its original state.
As a simple example of the data compression process, consider this sentence:
she sells sea shells by the sea shore
This sentence consists of 37 characters, including spaces. The spaces are important and can’t be simply thrown away, since removal would change the meaning of the original message.
The science of compression recognizes the repeating patterns in this sentence. The combination “se” appears three times, “sh” three times, and “lls” twice. In fact, the “se” pairs all have a space in front of them, and can be represented by “ se.” These patterns define the redundancy of the message.
Each of these patterns can be encoded by replacing them with a single character. For example:
- # =” se”
- $ =”sh”
- % =”lls”
The first replacement string includes a space at the beginning. The new form of this sentence, using these symbols, looks like this:
$e#%#a $e% by the#a $ore
The new representation is 24 characters long. This is a saving of 13 characters, or 36 percent. Applying the decompression process to the compressed string would result in the replacement symbols being converted back into their original form, restoring the original message.
How Does It Work?
Compression is accomplished using algorithms developed to achieve the best compression results for a given type of file. Algorithms are computer programs written to complete the steps needed to compress a file. Most compression algorithms operate in a more complex manner than the previous example, but all operate by the same principle of replacing repeating patterns with efficient encoding methods.
Different file types will typically compress to different sizes. The amount of compression that may be gained for any file, regardless of type, depends on how much redundant information it contains. Files with a high level of redundant information will compress more than files with low redundancy. Binary data compresses well, text data compresses better, and databases compress the best.
There are two main types of compression: lossy and lossless.
Lossy compression assumes that it’s acceptable for the compression process to discard some of the original data to achieve more efficient compression. This means that, after a compressed file is decompressed, it’s not an exact copy of the original file since some of the data was “lost” by the compression process. This assumption is valid for some types of data such as images. For example, when a file representing an image is compressed with lossy compression, the compression process may actually discard some of the image data to achieve greater compression. Typically, the human eye cannot detect the differences between the image generated from the original file and the image generated from the compressed file after decompression. Where a loss of data through compression is acceptable, compression rates of greater than 90 percent are common.
Lossless compression takes the opposite approach. In lossless compression, it’s unacceptable to discard any data. The decompressed form of a file must exactly match the original file content. Lossless compression is used when it’s necessary to faithfully reproduce the contents of a file through decompression. Files containing words or numbers and files that are intended for further computer processing may require lossless compression. In these situations, it would not be acceptable for any of the content to be discarded or altered by the compression process. Figure 1 shows compression ratios for a sample data file.
The benefits of data compression aren’t limited to a specific operating system. Although the data used in the analysis above is based primarily on common desktop file formats, compression is available on larger enterprise platforms such as mainframes, AS/400, Unix, and Linux systems. Similar compression results to those shown in Figure 2 can be achieved on these platforms.
While lossy data compression may work for multimedia-type files, using this technique can alter the meaning of data, which makes it inappropriate for strategic data.
Preparing for Battle
As businesses extend their operations and data beyond traditional corporate and IT boundaries, the need to securely compress data becomes more compelling.
IT, security and data center managers must now determine the safety of their data:
- From being stolen, destroyed in a disaster, or otherwise compromised
- In the data center
- In corporate systems
- Representing intellectual property
- Reflecting executive communications.
Security officers faced with protecting digital assets have woven together a fabric of systems and technologies to help manage the risk and protect their corporate IT operations. The problem is too big and amorphous for a one-size-fits-all solution.
Firewalls act as a protective perimeter to your IT assets, but they’re only one part of the security fabric. What happens when someone has breached your security wall? If you’re fortunate, there’s a virtual guard dog waiting to stop further access. Adding RACF, ACF2, or even Top Secret to the data processing environment is the first layer of protection for your mainframe. This is great as long as you don’t plan to move the data to other environments.
IT organizations spend tremendous time and effort protecting the network, configuring firewalls, isolating critical back-end processes, and keeping their servers running. However, the inevitable still happens. You probably know a company or someone who has fallen victim to a malicious attack, costing that company millions of dollars in production downtime. So, how can one prevent such an invasive, costly event? It’s a challenge, given the disparity of the platforms commonly deployed and the growing need to network with partners and ensure their systems use secure, effective technology.
The challenge is greater and more imminent for companies that fall under strict regulatory guidelines such as:
- Gramm-Leach-Bliley Act (protecting the privacy of financial data)
- Healthcare Information Portability and Accountability Act (HIPAA), which protects the privacy of patient information in the healthcare industry
- Sarbanes-Oxley (mandating efficient, financial reporting record-keeping among public companies).
Data centers can save money by combining encryption and compression. Encryption usually increases the file size, making pure encryption security solutions infrastructure-intensive. But combining the two techniques yields significant file size reductions. Figure 3 shows an example of a document encrypted with a typical industrial-strength algorithm.
When using encryption alone, the resulting file sizes are typically significantly larger than their original size. If the same files are compressed before being encrypted, each file is significantly smaller than when encryption is used alone. In the same data file example shown in Figure 4, each file is more than 50 percent smaller than its original size.
For the typical IT organization, the task of transferring large files over the network is usually scheduled at night, to minimize the disruption in network performance for the organization’s users. After compressing these files, sometimes by as much as 90 percent or more, the transfer requires less dedicated bandwidth, which often translates directly into cost savings, but also means more flexibility for the IT organization to schedule large file transfers.
Here are some best practices IT professionals can follow in selecting a ZIPbased data security solution:
- Evaluate both the product and the vendor: The product should offer native platform-specific features and support, and should not offload all its responsibilities to another system, such as a PC server. The vendor should be experienced in storage technology, with a strong track record for supporting customers with operations similar to yours.
- Insist on strong security: Strong security— using either passwords or digital certificates—is the industry standard for the protection of data in transit or in storage. The risks are too high to use anything less.
- Check for data integrity: Encryption serves to scramble data so it’s not decipherable by prying eyes. However, data integrity is paramount. The loss of one bit of data compromises the data transfer and raises suspicions about the integrity of the entire process. ZIP vendors that serve enterprise customers have developed sophisticated error-checking steps, such as CRC32, a standard data integrity calculation based on applying a logic operation (a series of bitwise operations) to a block of data to produce a fixed size value representing the original data in a file. A good 32-bit CRC process, for example, compares the final ZIP file to the initial ZIP file to ensure that there were no compromises in data during the process. Users or administrators are alerted about any discrepancy.
- Demand efficiency: Data compression makes files smaller. Encryption tends to increase the size of files. Combining encryption with data compression creates a secure file significantly smaller than the original. The chance of a transmission error is greatly reduced when files that typically take hours to send can instead be sent in minutes.
- Ensure security of data at rest: While securing data in transit gets all the attention, there remains a risk to data stored and archived. Don’t put up with solutions with large upfront overhead.
The security scrap heap is full of solutions that didn’t work. ZIP compression, already in use in virtually all data centers today, provides a robust platform for delivering strong security that is reliable, deployable, and usable. The traditional ZIP standard includes password-based security. Several companies have extended this further to include strong security, including the government Advanced Encryption Standard (AES). In using ZIP to secure and compress data in storage, data centers can reduce the security risk and pack more data into the same media. Deploying a ZIP implementation requires minimal infrastructure, training and support. By adding this layer of data security to the security mix, your organization can count on further protection for your most important enterprise asset: data.