Data Compression

Data compressing data is an increasingly important field in computer science, due to the increasing need for content over expanding networks. One of the simplest lossless compression algorithms is Huffman Encoding, a greedy approach to finding an optimal prefix code.

In an assignment in second year, I was tasked with implementing this algorithm in Python. I was able to achieve good compression levels with a minimal constant footprint of the dictionary in constant space.

Upon further optimisation of the system, I was able to perform more vigorous testing on larger datasets. Optimisations included profiling the code to be able to process the larger files like the Wikipedia page titles achieving 65% compression ratio on a file of nearly 300MB. It also included space optimisations like using the Python function xrange over range due to the memory limitations of certain versions of Python.

The project was a success in both educating information theory and challenging my skill in Python to work efficiently.

Share