1.

Give a brief overview of various data compression and archiving APIs in Python’s standard library.

Answer»

Python supports data compression using various algorithms such as zlib, gzip, bzip2 and lzma. Python library also has modules that can manage ZIP and tar archives.

Data compression and decompression according to zlib algorithm is implemented by zlib module. The gzip module provides a simple interface to compress and decompress files just as very popular GNU utilities GZIP and GUNZIP.

Following example creates a gzip file by writing compressed data in it.

>>> import gzip >>> data=b'Python is Easy' >>> with gzip.open("test.txt.gz", "WB") as f: f.write(data)

This will create “test.txt.gz” file in the current directory.

In order to read this compressed file:

>>> with gzip.open("test.txt.gz", "rb") as f: data=f.read() >>> data b'Python is Easy'

Note that the gz file should be opened in wb and rb mode respectively for writing and reading.

The bzip2 compression and decompression is implemented by bz2 module. Primary interface to the module involves following three functions:

  1. Open(): opens a bzip2 compressed file and returns a file object. The file can be opened as binary/text mode with read/write permissions. 
  2. write(): the file should be opened in ‘w’ or ‘wb’ mode. In binary mode, it writes compressed binary data to the file. In normal text mode, the file object is wrapped in TetIOWrapper object to perform encoding.
  3. read(): When opened in read mode, this function READS it and returns the uncompressed data.

Following code writes the compressed data to a bzip2 file

>>> f=bz2.open("test.bz2", "wb")         >>> data=b'KnowledgeHut Solutions Private Limited'         >>> f.write(data)  >>> f.close()

This will create test.bz2 file in the current directory. Any UNZIPPING tool will show a ‘test’ file in it. To read the uncompressed data from this test.bz2 file use the following code:

>>> f=bz2.open("test.bz2", "rb")         >>> data=f.read()         >>> data         b'KnowledgeHut Solutions Private Limited'

The Lempel–Ziv–Markov chain algorithm (LZMA) performs lossless data compression with a higher compression ratio than other algorithms. Python’s lzma module consists of classes and convenience functions for this purpose.

Following code is an example of lzma compression/decompression:

>>> import lzma >>> data=b"KnowledgeHut Solutions Private Limited" >>> f=lzma.open("test.xz","wb") >>>f.write(data) >>>f.close()

A ‘test.xz’ file will be created in the current working directory. To fetch uncompressed data from this file use the following code:

>>> import lzma >>> f=lzma.open("test.xz","rb") >>> data=f.read() >>> data b'KnowledgeHut Solutions Private Limited'

The ZIP is one of the most popular and old file formats used for archiving and compression. It is used by famous PKZIP application.

The zipfile module in Python’s standard library provides ZipFile() function that returns ZipFile object. Its write() and read() methods are used to create and read ARCHIVE.

>>> import zipfile >>> newzip=zipfile.ZipFile('a.zip','w') >>> newzip.write('abc.txt') >>> newzip.close()

To  read  data from a PARTICULAR file in the archive

>>> newzip=zipfile.ZipFile('a.zip','r') >>> data=newzip.read('abc.txt') >>> data

Finally, Python’s tarfile module helps you to create a tarball of multiple files by applying different compression algorithms.

Following example opens a tar file for compression with gzip algorithm and adds a file in it.

>>> fp=tarfile.open("a.tar.gz","w:gz")         >>> fp.add("abc.txt")         >>> fp.close()

Following code extracts the files from the tar archive, extracts all files and puts them in current folder. 

>>> fp=tarfile.open("a.tar.gz","r:gz")         >>> fp.extractall()         >>> fp.close()


Discussion

No Comment Found