Quickstart

Simple usage

The recommended binding to use is the LZ4 frame format binding, since this provides interoperability with other implementations and language bindings.

The simplest way to use the frame bindings is via the compress() and decompress() functions:

>>> import os
>>> import lz4.frame
>>> input_data = 20 * 128 * os.urandom(1024)  # Read 20 * 128kb
>>> compressed = lz4.frame.compress(input_data)
>>> decompressed = lz4.frame.decompress(compressed)
>>> decompressed == input_data
True

The compress() function reads the input data and compresses it and returns a LZ4 frame. A frame consists of a header, and a sequence of blocks of compressed data, and a frame end marker (and optionally a checksum of the uncompressed data). The decompress() function takes a full LZ4 frame, decompresses it (and optionally verifies the uncompressed data against the stored checksum), and returns the uncompressed data.

Working with data in chunks

It’s often inconvenient to hold the full data in memory, and so functions are also provided to compress and decompress data in chunks:

>>> import lz4.frame
>>> import os
>>> input_data = 20 * 128 * os.urandom(1024)
>>> c_context = lz4.frame.create_compression_context()
>>> compressed = lz4.frame.compress_begin(c_context)
>>> compressed += lz4.frame.compress_chunk(c_context, input_data[:10 * 128 * 1024])
>>> compressed += lz4.frame.compress_chunk(c_context, input_data[10 * 128 * 1024:])
>>> compressed += lz4.frame.compress_flush(c_context)

Here a compression context is first created which is used to maintain state across calls to the LZ4 library. This is an opaque PyCapsule object. compress_begin() starts a new frame and returns the frame header. compress_chunk() compresses input data and returns the compressed data. compress_flush() ends the frame and returns the frame end marker. The data returned from these functions is catenated to form the compressed frame.

compress_flush() also flushes any buffered data; by default, compress_chunk() may buffer data until a block is full. This buffering can be disabled by specifying auto_flush=True when calling compress_begin(). Alternatively, the LZ4 buffers can be flushed at any time without ending the frame by calling compress_flush() with end_frame=False.

Decompressing data can also be done in a chunked fashion:

>>> d_context = lz4.frame.create_decompression_context()
>>> d1, b, e = lz4.frame.decompress_chunk(d_context, compressed[:len(compressed)//2])
>>> d2, b, e = lz4.frame.decompress_chunk(d_context, compressed[len(compressed)//2:])
>>> d1 + d2 == input_data
True

Note that decompress_chunk() returns a tuple (decompressed_data, bytes_read, end_of_frame_indicator). decompressed_data is the decompressed data, bytes_read reports the number of bytes read from the compressed input. end_of_frame_indicator is True if the end-of-frame marker is encountered during the decompression, and False otherwise. If the end-of-frame marker is encountered in the input, no attempt is made to decompress the data after the marker.

Rather than managing compression and decompression context objects manually, it is more convenient to use the LZ4FrameCompressor and LZ4FrameDecompressor classes which provide context manager functionality:

>>> import lz4.frame
>>> import os
>>> input_data = 20 * 128 * os.urandom(1024)
>>> with lz4.frame.LZ4FrameCompressor() as compressor:
...     compressed = compressor.begin()
...     compressed += compressor.compress(input_data[:10 * 128 * 1024])
...     compressed += compressor.compress(input_data[10 * 128 * 1024:])
...     compressed += compressor.flush()
>>> with lz4.frame.LZ4FrameDecompressor() as decompressor:
...     decompressed = decompressor.decompress(compressed[:len(compressed)//2])
...     decompressed += decompressor.decompress(compressed[len(compressed)//2:])
>>> decompressed == input_data
True

Working with compressed files

The frame bindings provide capability for working with files containing LZ4 frame compressed data. This functionality is intended to be a drop in replacement for that offered in the Python standard library for bz2, gzip and LZMA compressed files. The lz4.frame.open() function is the most convenient way to work with compressed data files:

>>> import lz4.frame
>>> import os
>>> input_data = 20 * os.urandom(1024)
>>> with lz4.frame.open('testfile', mode='wb') as fp:
...     bytes_written = fp.write(input_data)
...     bytes_written == len(input_data)
True
>>> with lz4.frame.open('testfile', mode='r') as fp:
...     output_data = fp.read()
>>> output_data == input_data
True

The library also provides the class lz4.frame.LZ4FrameFile for working with compressed files.

Controlling the compression

Beyond the basic usage described above, there are a number of keyword arguments to tune and control the compression. A few of the key ones are listed below, please see the documentation for full details of options.

Controlling the compression level

The compression_level argument specifies the level of compression used with 0 (default) being the lowest compression (0-2 are the same value), and 16 the highest compression. Values below 0 will enable “fast acceleration”, proportional to the value. Values above 16 will be treated as 16. The following module constants are provided as a convenience:

Availability: lz4.frame.compress(), lz4.frame.compress_begin(), lz4.frame.open(), lz4.frame.LZ4FrameCompressor, lz4.frame.LZ4FrameFile.

Controlling the block size

The block_size argument specifies the maximum block size to use for the blocks in a frame. Options:

If unspecified, will default to lz4.frame.BLOCKSIZE_DEFAULT which is currently equal to lz4.frame.BLOCKSIZE_MAX64KB

Availability: lz4.frame.compress(), lz4.frame.compress_begin(), lz4.frame.open(), lz4.frame.LZ4FrameCompressor, lz4.frame.LZ4FrameFile.

Controlling block linking

The block_linked argument specifies whether to use block-linked compression. If True, the compression process will use data between sequential blocks to improve the compression ratio, particularly for small blocks. The default is True.

Availability: lz4.frame.compress(), lz4.frame.compress_begin(), lz4.frame.open(), lz4.frame.LZ4FrameCompressor, lz4.frame.LZ4FrameFile.

Data checksum validation

The content_checksum argument specifies whether to enable checksumming of the uncompressed content. If True, a checksum of the uncompressed data is stored at the end of the frame, and checked during decompression. Default is False.

The block_checksum argument specifies whether to enable checksumming of the uncompressed content of each individual block in the frame. If True, a checksum is stored at the end of each block in the frame, and checked during decompression. Default is False.

Availability: lz4.frame.compress(), lz4.frame.compress_begin(), lz4.frame.open(), lz4.frame.LZ4FrameCompressor, lz4.frame.LZ4FrameFile.

Data buffering

The LZ4 library can be set to buffer data internally until a block is filed in order to optimize compression. The auto_flush argument specifies whether the library should buffer input data or not.

When auto_flush is False the LZ4 library may buffer data internally. In this case, the compression functions may return no compressed data when called. This is the default.

When auto_flush is True, the compression functions will return compressed data immediately.

Availability: lz4.frame.compress(), lz4.frame.compress_begin(), lz4.frame.open(), lz4.frame.LZ4FrameCompressor, lz4.frame.LZ4FrameFile.

Storing the uncompressed source data size in the frame

The store_size and source_size arguments allow for storing the size of the uncompressed data in the frame header. Storing the source size in the frame header adds an extra 8 bytes to the size of the compressed frame, but allows the decompression functions to better size memory buffers during decompression.

If store_size is True the size of the uncompressed data will be stored in the frame header. Default is True.

Availability of store_size: lz4.frame.compress()

The source_size argument optionally specifies the uncompressed size of the source data to be compressed. If specified, the size will be stored in the frame header.

Availability of source_size: lz4.frame.LZ4FrameCompressor.begin(), lz4.frame.compress_begin(), lz4.frame.open(), lz4.frame.LZ4FrameFile.

Working with streamed compressed data

The stream bindings provide capability for working with stream compressed LZ4 data. This functionality is based on the usage of a ring-buffer (not implemented yet) or a double-buffer, with the length of each block preceding the compressed payload in the stream.

The stream compression reuses a context between each processed block for performance gain.

Most of the arguments used to initialize the LZ4 stream context are shared with the block API. Hereafter, those specific to the LZ4 stream API are detailed.

Controlling the buffer size

The buffer_size argument represents the base buffer size used internally for memory allocation:

  • In the case of the double-buffer strategy, this is the size of each buffer of the double-buffer.

When compressing, this size is the maximal length of the input uncompressed chunks.

When decompressing, this size is the maximal length of the decompressed data.

Storing the compressed data size in the block

The store_comp_size argument allows tuning of the size (in bytes) of the compressed block, which is prepended to the actual LZ4 compressed payload. This size can be either on 1, 2 or 4 bytes, or 0 for out-of-band block size record.