Celebrating 20 years of lzip
A magnum opus of lossy compression
Written by Christopher HesseApril 1st, 2020
It would be hard to imagine living without the lossy file compression technology offered by lzip (not to be confused with the lossless compression program lzip).
Released exactly 20 years ago this day, lzip took the compression world by storm. It's gone from an obscure SourceForge project to perhaps the most widely used lossy compression tool today.
Lossless compression works by removing repeated data from your file, but many files don't contain that much repeated data. lzip's insight was that most files do contain a lot of unimportant data, and that by getting rid of that data, you can completely surpass the best lossless compression.
Utilizing the powerful Lessiss-Moore algorithm, lzip is able to sort all your data in order of how important it is, and, given a threshold provided by the user, discard all data below that threshold. If you set the threshold high enough, your file may be reduced to zero size because nothing in it was really that important.
To get a sense of the power of lzip, it's helpful to compare against the level of proficiency that humans have at the same task.
State of the Art for Lossy Compression
George R.R. Martin pushed the limits of manual video compression when he was able to convert the Game of Thrones epic fantasy TV series (108 GB) into a handful of books which totalled only 10 MB in size. That's a compression ratio of over 9,000:1, which is practically unheard of.
While it's true that any user of the compressed form of Game of Thrones loses some information (they may fail to imagine someone who looks exactly like Maisie Williams when they decompress the books), it's hard to argue with that compression ratio. As for the quality of the result, some users have actually claimed the compressed version is superior to the original!
lzip is, of course, able to both compress and decompress in an automated fashion using a computer and requires neither a GRRM nor a user with accurate imagination.
Although lzip is rarely able to match GRRM's compression ratios without sacrificing quality, it can run in hours instead of 15 years and is available even for those that cannot afford the price of their own GRRM.
How it Works
In order to explore how lzip accomplishes such an amazing task, I've combined a couple of deep learning based approaches into a simple lossy image compressor meant as a "toy" version of lzip for images.
This approach uses a similar technique to that used by GRRM, except that the decompression can be done using a deep learning model instead of having to rely on the accuracy of the user's imagination.
Images are compressed into very compact descriptions of those images, and can then be decompressed back to an image when the user wishes to view it. The compression can be done on the server ahead of time and the decompressor can be pre-installed on the user's computer so that the user need only download the compact description.
Here are a few examples using Azure Computer Vision and DeepAI's Text to Image API:
Amazing! With compression ratios like these, it's easy to see why lzip's lossy compression is so popular.
Hopefully, this gives you a good idea of how lzip does it's magic, now get out there and compress some files!
all code samples on this site are in the public domain unless otherwise stated