'Data Compression' just seems complicated. Don't be afraid, compression is our good friend for many reasons. Save space on your hard drive. Makes data files to manage. It also reduces the immense time of downloading files from the Internet. Wouldn't it be nice if we could compress all files down to a few bytes? There is a limit to how much you can compress a file. How random the file is is the determining factor in how much it can be compressed. If the file is completely random and no pattern can be found, the shortest representation of the file is the file itself. The actual proof that proves this is at the end of my article. The key to compressing a file is finding some sort of exploitable pattern. Most of this article will explain the commonly used models. Null suppression is the most primitive form of data compression that I have been able to find. Basically, it says that if there are several fields where the data is located (possibly a spreadsheet) and any of them contain only zeros, the program simply deletes the data and goes directly from the empty data set to the next one. Just one step further than null suppression is run length encoding. Run length encoding simply tells you how many elements you have in a row. It would change a set of binary data like {0011100001} to what the computer reads as (2) zeros, (3) ones, (4) zeros, 1. As you can see, it works on the same basic idea as finding a also in this case a series of 0s (null suppression) and 1s and abbreviating them. Once the idea of data compression took hold, more and more people started working on specific programs. From these people we had some new premises to work with. Replacement coding is important. It was jointly invented by two people: Abraham Lempel and Jakob Ziv. Most compression algorithms (word roughly meaning 'program') that use substitution encoding start with 'LZ' for Lempel-Ziv. LZ-77 is really neat compression where the program starts by simply copying the file source to the new destination file, but when it recognizes a phrase of data it wrote previously, it replaces the second set of data in the destination file with directions on how to get to the first occurrence and copy it in place of the directions. This is more commonly called sliding window compression because the program's focus always scrolls around the file. LZ-78 is the compression that most people have at home..
tags