- # =” se”
- $ =”sh”
- % =”lls”
The first replacement string includes a space at the beginning. The new form of this sentence, using these symbols, looks like this:
$e#%#a $e% by the#a $ore
The new representation is 24 characters long. This is a saving of 13 characters, or 36 percent. Applying the decompression process to the compressed string would result in the replacement symbols being converted back into their original form, restoring the original message.
How Does It Work?
Compression is accomplished using algorithms developed to achieve the best compression results for a given type of file. Algorithms are computer programs written to complete the steps needed to compress a file. Most compression algorithms operate in a more complex manner than the previous example, but all operate by the same principle of replacing repeating patterns with efficient encoding methods.
Different file types will typically compress to different sizes. The amount of compression that may be gained for any file, regardless of type, depends on how much redundant information it contains. Files with a high level of redundant information will compress more than files with low redundancy. Binary data compresses well, text data compresses better, and databases compress the best.
There are two main types of compression: lossy and lossless.
Lossy compression assumes that it’s acceptable for the compression process to discard some of the original data to achieve more efficient compression. This means that, after a compressed file is decompressed, it’s not an exact copy of the original file since some of the data was “lost” by the compression process. This assumption is valid for some types of data such as images. For example, when a file representing an image is compressed with lossy compression, the compression process may actually discard some of the image data to achieve greater compression. Typically, the human eye cannot detect the differences between the image generated from the original file and the image generated from the compressed file after decompression. Where a loss of data through compression is acceptable, compression rates of greater than 90 percent are common.
Lossless compression takes the opposite approach. In lossless compression, it’s unacceptable to discard any data. The decompressed form of a file must exactly match the original file content. Lossless compression is used when it’s necessary to faithfully reproduce the contents of a file through decompression. Files containing words or numbers and files that are intended for further computer processing may require lossless compression. In these situations, it would not be acceptable for any of the content to be discarded or altered by the compression process. Figure 1 shows compression ratios for a sample data file.
The benefits of data compression aren’t limited to a specific operating system. Although the data used in the analysis above is based primarily on common desktop file formats, compression is available on larger enterprise platforms such as mainframes, AS/400, Unix, and Linux systems. Similar compression results to those shown in Figure 2 can be achieved on these platforms.