lz4.frame sub-package

This sub-package is in beta testing. Ahead of version 1.0 there may be API changes, but these are expected to be minimal, if any.

This sub-package provides the capability to compress and decompress data using the LZ4 frame specification.

The frame specification is recommended for most applications. A key benefit of using the frame specification (compared to the block specification) is interoperability with other implementations.

Low level bindings for full content (de)compression

These functions are bindings to the LZ4 Frame API functions for compressing data into a single frame, and decompressing a full frame of data.

lz4.frame.compress()

compress(data, compression_level=0, block_size=0, content_checksum=0, block_linked=True, store_size=True, return_bytearray=False)

Compresses data returning the compressed data as a complete frame.

The returned data includes a header and endmark and so is suitable for writing to a file.

Parameters:

data (str, bytes or buffer-compatible object) – data to compress

Keyword Arguments:
 
  • block_size (int) –

    Sepcifies the maximum blocksize to use. Options:

    If unspecified, will default to lz4.frame.BLOCKSIZE_DEFAULT which is currently equal to lz4.frame.BLOCKSIZE_MAX64KB.

  • block_linked (bool) – Specifies whether to use block-linked compression. If True, the compression ratio is improved, particularly for small block sizes. Default is True.
  • compression_level (int) –

    Specifies the level of compression used. Values between 0-16 are valid, with 0 (default) being the lowest compression (0-2 are the same value), and 16 the highest. Values below 0 will enable “fast acceleration”, proportional to the value. Values above 16 will be treated as 16. The following module constants are provided as a convenience:

  • content_checksum (bool) – Specifies whether to enable checksumming of the uncompressed content. If True, a checksum is stored at the end of the frame, and checked during decompression. Default is False.
  • block_checksum (bool) –

    Specifies whether to enable checksumming of the uncompressed content of each block. If True a checksum of the uncompressed data in each block in the frame is stored at

    the end of each block. If present, these checksums will be used

    to validate the data during decompression. The default is False meaning block checksums are not calculated and stored. This functionality is only supported if the underlying LZ4 library has version >= 1.8.0. Attempting to set this value to True with a version of LZ4 < 1.8.0 will cause a RuntimeError to be raised.

  • return_bytearray (bool) – If True a bytearray object will be returned. If False, a string of bytes is returned. The default is False.
  • store_size (bool) – If True then the frame will include an 8-byte header field that is the uncompressed size of data included within the frame. Default is True.
Returns:

Compressed data

Return type:

bytes or bytearray

lz4.frame.decompress(data, return_bytearray=False, return_bytes_read=False)

Decompresses a frame of data and returns it as a string of bytes.

Parameters:

data (str, bytes or buffer-compatible object) – data to decompress. This should contain a complete LZ4 frame of compressed data.

Keyword Arguments:
 
  • return_bytearray (bool) – If True a bytearray object will be returned. If False, a string of bytes is returned. The default is False.
  • return_bytes_read (bool) – If True then the number of bytes read from data will also be returned. Default is False
Returns:

Uncompressed data and optionally the number of bytes read

If the return_bytes_read argument is True this function returns a tuple consisting of:

  • bytes or bytearray: Uncompressed data
  • int: Number of bytes consumed from data

Return type:

bytes/bytearray or tuple

Low level bindings for chunked content (de)compression

These functions are bindings to the LZ4 Frame API functions allowing piece-wise compression and decompression. Using them requires managing compression and decompression contexts manually. An alternative to using these is to use the context manager classes described in the section below.

Compression

lz4.frame.create_compression_context()

Creates a compression context object.

The compression object is required for compression operations.

Returns:A compression context
Return type:cCtx
lz4.frame.compress_begin()

compress_begin(context, source_size=0, compression_level=0, block_size=0, content_checksum=0, content_size=1, block_linked=0, frame_type=0, auto_flush=1)

Creates a frame header from a compression context.

Parameters:

context (cCtx) – A compression context.

Keyword Arguments:
 
  • block_size (int) –

    Sepcifies the maximum blocksize to use. Options:

    If unspecified, will default to lz4.frame.BLOCKSIZE_DEFAULT which is currently equal to lz4.frame.BLOCKSIZE_MAX64KB.

  • block_linked (bool) – Specifies whether to use block-linked compression. If True, the compression ratio is improved, particularly for small block sizes. Default is True.
  • compression_level (int) –

    Specifies the level of compression used. Values between 0-16 are valid, with 0 (default) being the lowest compression (0-2 are the same value), and 16 the highest. Values below 0 will enable “fast acceleration”, proportional to the value. Values above 16 will be treated as 16. The following module constants are provided as a convenience:

  • content_checksum (bool) – Specifies whether to enable checksumming of the uncompressed content. If True, a checksum is stored at the end of the frame, and checked during decompression. Default is False.
  • block_checksum (bool) –

    Specifies whether to enable checksumming of the uncompressed content of each block. If True a checksum of the uncompressed data in each block in the frame is stored at

    the end of each block. If present, these checksums will be used

    to validate the data during decompression. The default is False meaning block checksums are not calculated and stored. This functionality is only supported if the underlying LZ4 library has version >= 1.8.0. Attempting to set this value to True with a version of LZ4 < 1.8.0 will cause a RuntimeError to be raised.

  • return_bytearray (bool) – If True a bytearray object will be returned. If False, a string of bytes is returned. The default is False.
  • auto_flush (bool) – Enable or disable autoFlush. When autoFlush is disabled the LZ4 library may buffer data internally until a block is full. Default is False (autoFlush disabled).
  • source_size (int) – This optionally specifies the uncompressed size of the data to be compressed. If specified, the size will be stored in the frame header for use during decompression. Default is True
  • return_bytearray – If True a bytearray object will be returned. If False, a string of bytes is returned. Default is False.
Returns:

Frame header.

Return type:

bytes or bytearray

lz4.frame.compress_chunk(context, data)

Compresses blocks of data and returns the compressed data.

The returned data should be concatenated with the data returned from lz4.frame.compress_begin and any subsequent calls to lz4.frame.compress_chunk.

Parameters:
  • context (cCtx) – compression context
  • data (str, bytes or buffer-compatible object) – data to compress
Keyword Arguments:
 

return_bytearray (bool) – If True a bytearray object will be returned. If False, a string of bytes is returned. The default is False.

Returns:

Compressed data.

Return type:

bytes or bytearray

Notes

If auto flush is disabled (auto_flush=False when calling lz4.frame.compress_begin) this function may buffer and retain some or all of the compressed data for future calls to lz4.frame.compress.

lz4.frame.compress_flush(context, end_frame=True, return_bytearray=False)

Flushes any buffered data held in the compression context.

This flushes any data buffed in the compression context, returning it as compressed data. The returned data should be appended to the output of previous calls to lz4.frame.compress_chunk.

The end_frame argument specifies whether or not the frame should be ended. If this is True and end of frame marker will be appended to the returned data. In this case, if content_checksum was True when calling lz4.frame.compress_begin, then a checksum of the uncompressed data will also be included in the returned data.

If the end_frame argument is True, the compression context will be reset and can be re-used.

Parameters:

context (cCtx) – Compression context

Keyword Arguments:
 
  • end_frame (bool) – If True the frame will be ended. Default is True.
  • return_bytearray (bool) – If True a bytearray object will be returned. If False, a bytes object is returned. The default is False.
Returns:

compressed data.

Return type:

bytes or bytearray

Notes

If end_frame is False but the underlying LZ4 library does not support flushing without ending the frame, a RuntimeError will be raised.

Decompression

lz4.frame.create_decompression_context()

Creates a decompression context object.

A decompression context is needed for decompression operations.

Returns:A decompression context
Return type:dCtx
lz4.frame.reset_decompression_context(context)

Resets a decompression context object.

This is useful for recovering from an error or for stopping an unfinished decompression and starting a new one with the same context

Parameters:context (dCtx) – A decompression context
lz4.frame.decompress_chunk(context, data, max_length=-1)

Decompresses part of a frame of compressed data.

The returned uncompressed data should be concatenated with the data returned from previous calls to lz4.frame.decompress_chunk

Parameters:
  • context (dCtx) – decompression context
  • data (str, bytes or buffer-compatible object) – part of a LZ4 frame of compressed data
Keyword Arguments:
 
  • max_length (int) – if non-negative this specifies the maximum number of bytes of uncompressed data to return. Default is -1.
  • return_bytearray (bool) – If True a bytearray object will be returned.If False, a string of bytes is returned. The default is False.
Returns:

uncompressed data, bytes read, end of frame indicator

This function returns a tuple consisting of:

  • The uncompressed data as a bytes or bytearray object
  • The number of bytes consumed from input data as an int
  • The end of frame indicator as a bool.

Return type:

tuple

The end of frame indicator is True if the end of the compressed frame has been reached, or False otherwise

Retrieving frame information

The following function can be used to retrieve information about a compressed frame.

lz4.frame.get_frame_info(frame)

Given a frame of compressed data, returns information about the frame.

Parameters:frame (str, bytes or buffer-compatible object) – LZ4 compressed frame
Returns:Dictionary with keys:
  • block_size (int): the maximum size (in bytes) of each block
  • block_size_id (int): identifier for maximum block size
  • content_checksum (bool): specifies whether the frame
    contains a checksum of the uncompressed content
  • content_size (int): uncompressed size in bytes of frame content
  • block_linked (bool): specifies whether the frame contains blocks which are independently compressed (False) or linked linked (True)
  • block_checksum (bool): specifies whether each block contains a checksum of its contents
  • skippable (bool): whether the block is skippable (True) or not (False)
Return type:dict

Helper context manager classes

These classes, which utilize the low level bindings to the Frame API are more convenient to use. They provide context management, and so it is not necessary to manually create and manage compression and decompression contexts.

class lz4.frame.LZ4FrameCompressor(block_size=0, block_linked=True, compression_level=0, content_checksum=False, block_checksum=False, auto_flush=False, return_bytearray=False)

Create a LZ4 frame compressor object.

This object can be used to compress data incrementally.

Parameters:
  • block_size (int) –

    Specifies the maximum blocksize to use. Options:

    If unspecified, will default to lz4.frame.BLOCKSIZE_DEFAULT which is equal to lz4.frame.BLOCKSIZE_MAX64KB.

  • block_linked (bool) – Specifies whether to use block-linked compression. If True, the compression ratio is improved, especially for small block sizes. If False the blocks are compressed independently. The default is True.
  • compression_level (int) –

    Specifies the level of compression used. Values between 0-16 are valid, with 0 (default) being the lowest compression (0-2 are the same value), and 16 the highest. Values above 16 will be treated as 16. Values between 4-9 are recommended. 0 is the default. The following module constants are provided as a convenience:

  • content_checksum (bool) – Specifies whether to enable checksumming of the payload content. If True, a checksum of the uncompressed data is stored at the end of the compressed frame which is checked during decompression. The default is False.
  • block_checksum (bool) – Specifies whether to enable checksumming of the content of each block. If True a checksum of the uncompressed data in each block in the frame is stored at the end of each block. If present, these checksums will be used to validate the data during decompression. The default is False, meaning block checksums are not calculated and stored. This functionality is only supported if the underlying LZ4 library has version >= 1.8.0. Attempting to set this value to True with a version of LZ4 < 1.8.0 will cause a RuntimeError to be raised.
  • auto_flush (bool) – When False, the LZ4 library may buffer data until a block is full. When True no buffering occurs, and partially full blocks may be returned. The default is False.
  • return_bytearray (bool) – When False a bytes object is returned from the calls to methods of this class. When True a bytearray object will be returned. The default is False.
begin(source_size=0)

Begin a compression frame.

The returned data contains frame header information. The data returned from subsequent calls to compress() should be concatenated with this header.

Keyword Arguments:
 source_size (int) – Optionally specify the total size of the uncompressed data. If specified, will be stored in the compressed frame header as an 8-byte field for later use during decompression. Default is 0 (no size stored).
Returns:frame header data
Return type:bytes or bytearray
compress(data)

Compresses data and returns it.

This compresses data (a bytes object), returning a bytes or bytearray object containing compressed data the input.

If auto_flush has been set to False, some of data may be buffered internally, for use in later calls to LZ4FrameCompressor.compress() and LZ4FrameCompressor.flush().

The returned data should be concatenated with the output of any previous calls to compress() and a single call to compress_begin().

Parameters:data (str, bytes or buffer-compatible object) – data to compress
Returns:compressed data
Return type:bytes or bytearray
flush()

Finish the compression process.

This returns a bytes or bytearray object containing any data stored in the compressor’s internal buffers and a frame footer.

The LZ4FrameCompressor instance may be re-used after this method has been called to create a new frame of compressed data.

Returns:compressed data and frame footer.
Return type:bytes or bytearray
has_context()

Return whether the compression context exists.

Returns:
True if the compression context exists, False
otherwise.
Return type:bool
reset()

Reset the LZ4FrameCompressor instance.

This allows the LZ4FrameCompression instance to be re-used after an error.

started()

Return whether the compression frame has been started.

Returns:
True if the compression frame has been started, False
otherwise.
Return type:bool
class lz4.frame.LZ4FrameDecompressor(return_bytearray=False)

Create a LZ4 frame decompressor object.

This can be used to decompress data incrementally.

For a more convenient way of decompressing an entire compressed frame at once, see lz4.frame.decompress().

Parameters:return_bytearray (bool) – When False a bytes object is returned from the calls to methods of this class. When True a bytearray object will be returned. The default is False.
eof

True if the end-of-stream marker has been reached. False otherwise.

Type:bool
unused_data

Data found after the end of the compressed stream. Before the end of the frame is reached, this will be b''.

Type:bytes
needs_input

False if the decompress() method can provide more decompressed data before requiring new uncompressed input. True otherwise.

Type:bool
decompress(data, max_length=-1)

Decompresses part or all of an LZ4 frame of compressed data.

The returned data should be concatenated with the output of any previous calls to decompress().

If max_length is non-negative, returns at most max_length bytes of decompressed data. If this limit is reached and further output can be produced, the needs_input attribute will be set to False. In this case, the next call to decompress() may provide data as b'' to obtain more of the output. In all cases, any unconsumed data from previous calls will be prepended to the input data.

If all of the input data was decompressed and returned (either because this was less than max_length bytes, or because max_length was negative), the needs_input attribute will be set to True.

If an end of frame marker is encountered in the data during decompression, decompression will stop at the end of the frame, and any data after the end of frame is available from the unused_data attribute. In this case, the LZ4FrameDecompressor instance is reset and can be used for further decompression.

Parameters:data (str, bytes or buffer-compatible object) – compressed data to decompress
Keyword Arguments:
 max_length (int) – If this is non-negative, this method returns at most max_length bytes of decompressed data.
Returns:Uncompressed data
Return type:bytes
reset()

Reset the decompressor state.

This is useful after an error occurs, allowing re-use of the instance.

Reading and writing compressed files

These provide capability for reading and writing of files using LZ4 compressed frames. These are designed to be drop in replacements for the LZMA, BZ2 and Gzip equivalent functionalities in the Python standard library.

lz4.frame.open(filename, mode='rb', encoding=None, errors=None, newline=None, block_size=0, block_linked=True, compression_level=0, content_checksum=False, block_checksum=False, auto_flush=False, return_bytearray=False, source_size=0)

Open an LZ4Frame-compressed file in binary or text mode.

filename can be either an actual file name (given as a str, bytes, or PathLike object), in which case the named file is opened, or it can be an existing file object to read from or write to.

The mode argument can be 'r', 'rb' (default), 'w', 'wb', 'x', 'xb', 'a', or 'ab' for binary mode, or 'rt', 'wt', 'xt', or 'at' for text mode.

For binary mode, this function is equivalent to the LZ4FrameFile constructor: LZ4FrameFile(filename, mode, ...).

For text mode, an LZ4FrameFile object is created, and wrapped in an io.TextIOWrapper instance with the specified encoding, error handling behavior, and line ending(s).

Parameters:

filename (str, bytes, os.PathLike) – file name or file object to open

Keyword Arguments:
 
  • mode (str) – mode for opening the file
  • encoding (str) – the name of the encoding that will be used for encoding/deconging the stream. It defaults to locale.getpreferredencoding(False). See io.TextIOWrapper for further details.
  • errors (str) – specifies how encoding and decoding errors are to be handled. See io.TextIOWrapper for further details.
  • newline (str) – controls how line endings are handled. See io.TextIOWrapper for further details.
  • return_bytearray (bool) – When False a bytes object is returned from the calls to methods of this class. When True a bytearray object will be returned. The default is False.
  • source_size (int) – Optionally specify the total size of the uncompressed data. If specified, will be stored in the compressed frame header as an 8-byte field for later use during decompression. Default is 0 (no size stored). Only used for writing compressed files.
  • block_size (int) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
  • block_linked (bool) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
  • compression_level (int) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
  • content_checksum (bool) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
  • block_checksum (bool) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
  • auto_flush (bool) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
class lz4.frame.LZ4FrameFile(filename=None, mode='r', block_size=0, block_linked=True, compression_level=0, content_checksum=False, block_checksum=False, auto_flush=False, return_bytearray=False, source_size=0)

A file object providing transparent LZ4F (de)compression.

An LZ4FFile can act as a wrapper for an existing file object, or refer directly to a named file on disk.

Note that LZ4FFile provides a binary file interface - data read is returned as bytes, and data to be written must be given as bytes.

When opening a file for writing, the settings used by the compressor can be specified. The underlying compressor object is lz4.frame.LZ4FrameCompressor. See the docstrings for that class for details on compression options.

Parameters:

filename (str, bytes, PathLike, file object) – can be either an actual file name (given as a str, bytes, or PathLike object), in which case the named file is opened, or it can be an existing file object to read from or write to.

Keyword Arguments:
 
  • mode (str) – mode can be 'r' for reading (default), 'w' for (over)writing, 'x' for creating exclusively, or 'a' for appending. These can equivalently be given as 'rb', 'wb', 'xb' and 'ab' respectively.
  • return_bytearray (bool) – When False a bytes object is returned from the calls to methods of this class. When True a bytearray object will be returned. The default is False.
  • source_size (int) – Optionally specify the total size of the uncompressed data. If specified, will be stored in the compressed frame header as an 8-byte field for later use during decompression. Default is 0 (no size stored). Only used for writing compressed files.
  • block_size (int) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
  • block_linked (bool) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
  • compression_level (int) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
  • content_checksum (bool) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
  • block_checksum (bool) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
  • auto_flush (bool) – Compressor setting. See lz4.frame.LZ4FrameCompressor.
close()

Flush and close the file.

May be called more than once without error. Once the file is closed, any other operation on it will raise a ValueError.

closed

Returns True if this file is closed.

Returns:True if the file is closed, False otherwise.
Return type:bool
fileno()

Return the file descriptor for the underlying file.

Returns:file descriptor for file.
Return type:file object
flush()

Flush the file, keeping it open.

May be called more than once without error. The file may continue to be used normally after flushing.

peek(size=-1)

Return buffered data without advancing the file position.

Always returns at least one byte of data, unless at EOF. The exact number of bytes returned is unspecified.

Returns:uncompressed data
Return type:bytes
read(size=-1)

Read up to size uncompressed bytes from the file.

If size is negative or omitted, read until EOF is reached. Returns b'' if the file is already at EOF.

Parameters:size (int) – If non-negative, specifies the maximum number of uncompressed bytes to return.
Returns:uncompressed data
Return type:bytes
read1(size=-1)

Read up to size uncompressed bytes.

This method tries to avoid making multiple reads from the underlying stream.

This method reads up to a buffer’s worth of data if size is negative.

Returns b'' if the file is at EOF.

Parameters:size (int) – If non-negative, specifies the maximum number of uncompressed bytes to return.
Returns:uncompressed data
Return type:bytes
readable()

Return whether the file was opened for reading.

Returns:
True if the file was opened for reading, False
otherwise.
Return type:bool
readline(size=-1)

Read a line of uncompressed bytes from the file.

The terminating newline (if present) is retained. If size is non-negative, no more than size bytes will be read (in which case the line may be incomplete). Returns b’’ if already at EOF.

Parameters:size (int) – If non-negative, specifies the maximum number of uncompressed bytes to return.
Returns:uncompressed data
Return type:bytes
seek(offset, whence=0)

Change the file position.

The new position is specified by offset, relative to the position indicated by whence. Possible values for whence are:

  • io.SEEK_SET or 0: start of stream (default): offset must not be negative
  • io.SEEK_CUR or 1: current stream position
  • io.SEEK_END or 2: end of stream; offset must not be positive

Returns the new file position.

Note that seeking is emulated, so depending on the parameters, this operation may be extremely slow.

Parameters:
  • offset (int) – new position in the file
  • whence (int) – position with which offset is measured. Allowed values are 0, 1, 2. The default is 0 (start of stream).
Returns:

new file position

Return type:

int

seekable()

Return whether the file supports seeking.

Returns:True if the file supports seeking, False otherwise.
Return type:bool
tell()

Return the current file position.

Parameters:None
Returns:file position
Return type:int
writable()

Return whether the file was opened for writing.

Returns:
True if the file was opened for writing, False
otherwise.
Return type:bool
write(data)

Write a bytes object to the file.

Returns the number of uncompressed bytes written, which is always the length of data in bytes. Note that due to buffering, the file on disk may not reflect the data written until close() is called.

Parameters:data (bytes) – uncompressed data to compress and write to the file
Returns:the number of uncompressed bytes written to the file
Return type:int

Module attributes

A number of module attributes are defined for convenience. These are detailed below.

Compression level

The following module attributes can be used when setting the compression_level argument.

lz4.frame.COMPRESSIONLEVEL_MIN

Specifier for the minimum compression level.

Specifying compression_level=lz4.frame.COMPRESSIONLEVEL_MIN will instruct the LZ4 library to use a compression level of 0

lz4.frame.COMPRESSIONLEVEL_MINHC

Specifier for the minimum compression level for high compression mode.

Specifying compression_level=lz4.frame.COMPRESSIONLEVEL_MINHC will instruct the LZ4 library to use a compression level of 3, the minimum for the high compression mode.

lz4.frame.COMPRESSIONLEVEL_MAX

Specifier for the maximum compression level.

Specifying compression_level=lz4.frame.COMPRESSIONLEVEL_MAX will instruct the LZ4 library to use a compression level of 16, the highest compression level available.

Block size

The following attributes can be used when setting the block_size argument.

lz4.frame.BLOCKSIZE_DEFAULT

Specifier for the default block size.

Specifying block_size=lz4.frame.BLOCKSIZE_DEFAULT will instruct the LZ4 library to use the default maximum blocksize. This is currently equivalent to lz4.frame.BLOCKSIZE_MAX64KB

lz4.frame.BLOCKSIZE_MAX64KB

Specifier for a maximum block size of 64 kB.

Specifying block_size=lz4.frame.BLOCKSIZE_MAX64KB will instruct the LZ4 library to create blocks containing a maximum of 64 kB of uncompressed data.

lz4.frame.BLOCKSIZE_MAX256KB

Specifier for a maximum block size of 256 kB.

Specifying block_size=lz4.frame.BLOCKSIZE_MAX256KB will instruct the LZ4 library to create blocks containing a maximum of 256 kB of uncompressed data.

lz4.frame.BLOCKSIZE_MAX1MB

Specifier for a maximum block size of 1 MB.

Specifying block_size=lz4.frame.BLOCKSIZE_MAX1MB will instruct the LZ4 library to create blocks containing a maximum of 1 MB of uncompressed data.

lz4.frame.BLOCKSIZE_MAX4MB

Specifier for a maximum block size of 4 MB.

Specifying block_size=lz4.frame.BLOCKSIZE_MAX4MB will instruct the LZ4 library to create blocks containing a maximum of 4 MB of uncompressed data.