Compressing data is nothing new for the data center. Making files smaller is one of the ways IT managers ease the transmission and storage of the vast amounts of data that reside in data centers worldwide. Storage Management Subsystem (SMS) data set compression, writing data to tape, and other hardware-based compression technologies have been around for some time on the S/390, but these methods often are insufficient. They leave the IT shop wanting the capability to easily move data off the host onto other computing platforms to serve the needs of various business units.
A common method of transferring data is over networks. However, with electronic distribution comes additional security risks not necessarily found when transferring data on physical media. There should be no compromises in security when it comes to your corporate data.
ZIP, the file compression format created in 1986, has become a staple for data centers and other IT shops, with more than 20 percent of all mainframes worldwide using a ZIP solution. Compressing data into ZIP archives provides interoperability between platforms and considerable savings in DASD and bandwidth. Recently, strong security has also been added to the ZIP format.
The purpose of data compression is simply to make files smaller. Compression reduces file sizes by eliminating redundant patterns and encoding the contents of the file, using symbols that require less storage space than was originally required. After a file is compressed, its content is changed to an encoded form and the file cannot be used until it’s decompressed. The decompression process is the inverse of compression. It restores a file to its original state.
As a simple example of the data compression process, consider this sentence:
she sells sea shells by the sea shore
This sentence consists of 37 characters, including spaces. The spaces are important and can’t be simply thrown away, since removal would change the meaning of the original message.
The science of compression recognizes the repeating patterns in this sentence. The combination “se” appears three times, “sh” three times, and “lls” twice. In fact, the “se” pairs all have a space in front of them, and can be represented by “ se.” These patterns define the redundancy of the message.
Each of these patterns can be encoded by replacing them with a single character. For example: