zcode 0.0.1

Creator: bradpython12

Last updated:

Add to Cart

Description:

zcode 0.0.1

Zee Code
ZCode is a custom compression algorithm I originally developed for a competition held for the Spring 2019 Datastructures
and Algorithms course of Dr. Mahdi Safarnejad-Boroujeni at Sharif University of Technology, at which I became
first-place. The code is pretty slow and has a lot of room for optimization, but it is pretty readable. It can be an
excellent educational resource for whoever is starting on compression algorithms.
The algorithm is a cocktail of classical compression algorithms mixed and served for Unicode documents. It hinges around
the LZW algorithm to create a finite size symbol dictionary; the results are then byte-coded into variable-length custom
symbols, which I call zee codes! Finally, the symbol table is truncated accordingly, and the compressed document is
encoded into a byte stream.
Huffman trees highly inspire zee codes, but because in normal texts, symbols are usually much more uniformly distributed
than the original geometrical (or exponential) distribution assumption for effective Huffman coding, the gains of using
variable-sized byte-codes both from an implementation and performance perspective outweighed bit Huffman encodings.
Results may vary, but my tests showed a steady ~4-5x compression ratio on Farsi texts, which is pretty nice!
Installation
ZCode is available on pip, and only requires a 3.6 or higher python installation beforehand.
pip install -U zcode

Usage
You can run the algorithm for any utf-8 encoded file using the zcode command. It will automatically decompress files
ending with a .zee extensions and compress others into .zee files, but you can always override the default behavior
by providing optional arguments like:
zcode INPUTFILE [--output OUTPUT_FILE --action compress/decompress --symbol-size SYMBOL_SIZE --code-size CODE_SIZE]

The symbol-size argument controls the algorithms' buffer size for processing symbols (in bytes). It is automatically
set depending on your input file size but you can change it as you wish. code-size controls the maximum length for
coded bytes while encoding symbols (this equals to 2 by default and needs to be provided to the algorithm upon
decompression).
LICENSE
MIT LICENSE, see vahidzee/zcode/LICENSE

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files:

Customer Reviews

There are no reviews.