lz4.block sub-package

This sub-package provides the capability to compress and decompress data using the block specification.

Because the LZ4 block format doesn’t define a container format, the Python bindings will by default insert the original data size as an integer at the start of the compressed payload. However, it is possible to disable this functionality, and you may wish to do so for compatibility with other language bindings, such as the Java bindings.

Example usage

To use the lz4 block format bindings is straightforward:

>>> import lz4.block
>>> import os
>>> input_data = 20 * 128 * os.urandom(1024)  # Read 20 * 128kb
>>> compressed_data = lz4.block.compress(input_data)
>>> output_data = lz4.block.decompress(compressed_data)
>>> input_data == output_data
True

In this simple example, the size of the uncompressed data is stored in the compressed data, and this size is then utilized when uncompressing the data in order to correctly size the buffer. Instead, you may want to not store the size of the uncompressed data to ensure compatibility with the Java bindings. The example below demonstrates how to use the block format without storing the size of the uncompressed data.

>>> import lz4.block
>>> data = b'0' * 255
>>> compressed = lz4.block.compress(data, store_size=False)
>>> decompressed = lz4.block.decompress(compressed, uncompressed_size=255)
>>> decompressed == data
True

The uncompressed_size argument specifies an upper bound on the size of the uncompressed data size rather than an absolute value, such that the following example also works.

>>> import lz4.block
>>> data = b'0' * 255
>>> compressed = lz4.block.compress(data, store_size=False)
>>> decompressed = lz4.block.decompress(compressed, uncompressed_size=2048)
>>> decompressed == data
True

A common situation is not knowing the size of the uncompressed data at decompression time. The following example illustrates a strategy that can be used in this case.

>>> import lz4.block
>>> data = b'0' * 2048
>>> compressed = lz4.block.compress(data, store_size=False)
>>> usize = 255
>>> max_size = 4096
>>> while True:
...     try:
...         decompressed = lz4.block.decompress(compressed, uncompressed_size=usize)
...         break
...     except lz4.block.LZ4BlockError:
...         usize *= 2
...         if usize > max_size:
...             print('Error: data too large or corrupt')
...             break
>>> decompressed == data
True

In this example we are catching the lz4.block.LZ4BlockError exception. This exception is raisedd if the LZ4 library call fails, which can be caused by either the buffer used to store the uncompressed data (as set by usize) being too small, or the input compressed data being invalid - it is not possible to distinguish the two cases, and this is why we set an absolute upper bound (max_size) on the memory that can be allocated for the uncompressed data. If we did not take this precaution, the code, if ppassed invalid compressed data would continuously try to allocate a larger and larger buffer for decompression until the system ran out of memory.

Contents

lz4.block.compress(source, mode='default', acceleration=1, compression=0, return_bytearray=False)

Compress source, returning the compressed data as a string. Raises an exception if any error occurs.

Parameters:

source (str, bytes or buffer-compatible object) – Data to compress

Keyword Arguments:
 
  • mode (str) – If 'default' or unspecified use the default LZ4 compression mode. Set to 'fast' to use the fast compression LZ4 mode at the expense of compression. Set to 'high_compression' to use the LZ4 high-compression mode at the exepense of speed.
  • acceleration (int) – When mode is set to 'fast' this argument specifies the acceleration. The larger the acceleration, the faster the but the lower the compression. The default compression corresponds to a value of 1.
  • compression (int) – When mode is set to high_compression this argument specifies the compression. Valid values are between 1 and 12. Values between 4-9 are recommended, and 9 is the default.
  • store_size (bool) – If True (the default) then the size of the uncompressed data is stored at the start of the compressed block.
  • return_bytearray (bool) – If False (the default) then the function will return a bytes object. If True, then the function will return a bytearray object.
  • dict (str, bytes or buffer-compatible object) – If specified, perform compression using this initial dictionary.
Returns:

Compressed data.

Return type:

bytes or bytearray

lz4.block.decompress(source, uncompressed_size=-1, return_bytearray=False)

Decompress source, returning the uncompressed data as a string. Raises an exception if any error occurs.

Parameters:

source (str, bytes or buffer-compatible object) – Data to decompress.

Keyword Arguments:
 
  • uncompressed_size (int) – If not specified or negative, the uncompressed data size is read from the start of the source block. If specified, it is assumed that the full source data is compressed data. If this argument is specified, it is considered to be a maximum possible size for the buffer used to hold the uncompressed data, and so less data may be returned. If uncompressed_size is too small, LZ4BlockError will be raised. By catching LZ4BlockError it is possible to increase uncompressed_size and try again.
  • return_bytearray (bool) – If False (the default) then the function will return a bytes object. If True, then the function will return a bytearray object.
  • dict (str, bytes or buffer-compatible object) – If specified, perform decompression using this initial dictionary.
Returns:

Decompressed data.

Return type:

bytes or bytearray

Raises:

LZ4BlockError – raised if the call to the LZ4 library fails. This can be caused by uncompressed_size being too small, or invalid data.