Tuesday, May 20, 2014

Streaming API for LZ4

 For quite some time, the LZ4 Streaming API project has been started and delayed, as other priorities stepped in the way. To be fair, one important delaying factor was the difficulty to define a "clean enough" API, something that would be simple to use and understand, while also providing the level of fine-tuning required by advanced programmers (typically within embedded environments).

I feel quite close today from breaking this shell, with an interface definition which matches my definition of "clean enough" API. So let's share some preliminary results.

Current streaming interface

The current streaming API exposes the following functions :

void* LZ4_create (const char* inputBuffer);
int   LZ4_compress_continue (void* LZ4_Data, const char* source, char* dest, int inputSize);
char* LZ4_slideInputBuffer (void* LZ4_Data);
int   LZ4_free (void* LZ4_Data);

Although it works as intended, this API introduces some critical limitations.

First, there is a need to create a data structure which tracks the behavior of the stream.
This is void* LZ4_data, which will be generated by LZ4_create().

It looks fine, but a first issue is that allocation happens inside LZ4_create(). In some circumstances, this may not be acceptable : sometimes allocation must be fully controlled by the host process. For example, standard functions such as malloc() may be unavailable. For this use case have been created 2 more functions :

int LZ4_sizeofStreamState(void);
int LZ4_resetStreamState(void* state, const char* inputBuffer);

It looks fine too, but is unfortunately still not usable for static allocation purpose (on stack, for example).

The second issue is the const char* inputBuffer argument, specified at creation stage, because it will be fixed during the entire compression process. It's not possible to jump between memory segments. This is a major limitation, preventing "in-place" compression of scattered memory segments.

LZ4_compress_continue() looks fine as a function definition, but what is less so is the documented requirement : up to 64KB of previously processed data must stand *before* the data to be compressed. This is a major constraint, typically requiring to prepare data into an intermediate buffer (which in some circumstances, might prove an unaffordable burden).

Then there is LZ4_slideInputBuffer(), which is arguably not at its right responsibility level. Its mission is to copy the last previous 64KB of source data to the beginning of the input buffer. It exists because the content of void* LZ4_data is not accessible.

No function is defined to simply load a dictionary : to achieve that goal, one has to compress a segment using LZ4_compress_continue(), throw the result, and compress the next segment using data accumulated so far.

To summarize, yes it can work, but it's cumbersome. One has to accept the idea that data to compress must be prepared into an intermediate buffer where above conditions can be met.
It's not convenient, and may sometimes not be possible, for example due to allocation limitations.
It also has a negative impact on performance, due to memory copy operations overhead.

New streaming interface definition

While defining the streaming API, I constantly struggled to find the right balance between a "fully automated" API, and a "user-controlled" one. The "fully automated API" would take in charge all kind of workload, generate data respecting the LZ4 Streaming Format with full headers and markers, and take care of allocation for required buffers. On the other hand, the "user-controlled" API would serve the needs for host programs which require full control over resource allocation.

I therefore settled with providing both.
The "user-controlled" version will be part of LZ4 library. It's the simpler one, and only takes care of generating compressed block format, chained or independent. It depends on the host process to pay attention to all memory operations (allocation, position and availability) and to provide its own custom format linking the successive blocks.

The "fully automated" API will be part of a new library, which will be called LZ4S for "LZ4 Streaming". It is basically a user program of the previous API.

In this blog post, we will therefore concentrate on the first API, since it is also an underlying foundation for the 2nd one.

typedef struct { unsigned int table[LZ4_STREAMSIZE_U32]; } LZ4_stream_t;

int LZ4_compress_continue (void* LZ4_stream, const char* source, char* dest, int inputSize);

OK, let's focus on the most important differences with current API.

An LZ4_stream_t structure is exposed. It is provided to give a better control over memory management to the host program. For example, allocation can be static (stack, global) or dynamic, and use any user-defined allocation function.

Obviously, the host program is both in charge of allocating and freeing this memory space. It may also duplicate it "at will", which is a good idea for "static dictionary compression" scenarios.

Cleaning (or resetting) LZ4_stream_t is only a matter of memset() it to zero. It's a requirement to do it once before using this structure.

LZ4_stream_t 's size is controlled by the value LZ4_STREAMSIZE_U32, which is checked at compile time thanks to a static assert piece of code, as already used within xxHash. It mostly depends on LZ4's hash table (which typical size is about 16KB). Hash table size is a parameter controlled by the defined value LZ4_MEMORY_USAGE. Up to now, this define was present into lz4.c. To be coherent with the new interface, it will be moved to lz4.h instead. I don't foresee any issue with that.

LZ4_stream_t is described as a table of unsigned int. This is intentional. The real usage of this memory bank remains hidden inside lz4.c. This way, internal variables within the structure cannot be used as a kind of implicit interface contract, allowing future modifications to happen without breaking compatibility with existing programs.

Once the streaming structure is created, you can start to populate it by loading a static dictionary using :

int LZ4_loadDict (void* LZ4_stream, const char* source, int size);

This part is optional. Loading a dictionary flushes any prior data reference from LZ4_stream_t , if there is any, so you may also use this function to initialize an LZ4_stream_t structure by simply setting a size of 0.

LZ4_compress_continue() looks the same as previously, and is indeed compatible,  but a major difference is that it no longer requires to compress next data block "right after" previous data. Previous and Next data can be anywhere in memory. The LZ4_stream_t structure will keep track of it automatically.

One strong assumption of LZ4_compress_continue() is that previously compressed data is still available. Unfortunately, this might not be the case, and this function cannot guess that.
Should previously compressed data be no longer accessible, for example because you need the space for something else, or you can't guarantee its availability, it's possible to solve that situation :
  • You can "save" the relevant data segment from previous block (typically, the last 64KB, it can also be less than that) into a "safe" place. At which point, you will need to indicate this change of position.
int LZ4_moveDict (void* LZ4_stream, char* safeBuffer, int dictSize);

Memory space available at char* safeBuffer must be at least dictSize,  since it is the amount of data  LZ4_moveDict() will copy there (Note : the maximum amount of data that will be copied is 64KB, even if dictSize is larger).


Decompression

The current streaming decompression API is defined as follows :

int LZ4_decompress_safe_withPrefix64k (const char* source, char* dest, int compressedSize, int maxOutputSize);
int LZ4_decompress_fast_withPrefix64k (const char* source, char* dest, int originalSize);

Its documented requirement is that previously decompressed data must stand *right before* the memory address where the next data block is going to be decompressed. It works, but implies the creation of a temporary buffer to match the requirement.

The new streaming API get rid of this limitation, allowing previous data to stand anywhere within accessible memory :

int LZ4_decompress_safe_usingDict (const char* source, char* dest, int compressedSize, int maxOutputSize, const char* dictStart, int dictSize);
int LZ4_decompress_fast_usingDict (const char* source, char* dest, int originalSize, const char* dictStart, int dictSize);

The dictionary definition is part of the argument list (const char* dictStart, int dictSize), therefore there is no need for a tracking structure, in contrast with compression.



A first code implementing this API is currently available in the "dev" branch of LZ4 on github. It is still early stage, and probably the right time to be provide your comments should you have any.

[Edit] Added : LZ4_moveDict() as a potential replacement of LZ4_SetDictPos()
[Edit] Added : Paragraph on decompression
[Edit] Modified : move to LZ4_moveDict()
[Edit] Interface definition converges towards LZ4_compress_continue()
[Edit] "streaming" branch now merged into "dev" branch