Abstract
Compression refers to reducing the quantity of data, bits used to represent, store and/or transmit file content, without excessively
reducing the quality of the original data. Data Compression technique can be Lossless which enables exact reproduction of the
original data on decomposition or Lossy which does not. Text data compression techniques are normally lossless. There are many
text data compression algorithms, and it is essential that best ones be identified and evaluated towards optimizing compression
purposes. This research focused on proffering key compression algorithms, and evaluating them towards ensuring selection of
better algorithms over poorer ones. Dominant text data compression algorithms employing Statistical and Dictionary based
compression techniques were determined through qualitative analysis of extant literature for convergence and expositions using
inductive approach. The proffered algorithms were implemented in Java, and were evaluated along compression ratio,
compression and decompression time using text files obtained from Canterbury corpus. Huffman and Arithmetic coding
techniques were proffered for statistical, and Lempel-Ziv-Welch (LZW) for dictionary-based technique. LZW was indicated the
best with the highest compression ratio of 2.36314, followed by Arithmetic with compression ratio of 1.70534, and Huffman with
compression ratio of 1.62877. It was noted that the performance of the data compression algorithms on compression time and
decompression time depend on the characteristics of the files, the different symbols and symbol frequencies. Usage of LZW, in
data storage and transmission, would go a long way in optimizing compression purposes.