Friday, February 25, 2011

Multi-threaded compression experiment : LZ4 version 0.7

Putting into practice conclusions from an earlier post on multi-threading, I finally created a code branch using multi-threading for LZ4.
This is an opportunity to present version 0.7, which for now limit multi-threading to the "benchmark mode" only.

Results are interesting : speed improvement is very noticeable.
However, using 2 threads does not translate into exactly X2 performance (we are more into the +80-90% range).

There are at least 2 reasons for this (that i have identified).
First, not all segments compress as fast as others. This is expected : compression speed is partly dependent on file type being compressed. On a large file, some segments are easier and faster to compress than others. This is exactly what's happening here : i'm effectively impacted by the "slowest segment" which decides the final speed of algorithm.
As an interesting side-effect, using more threads than cores can sometimes lead to increased performance, which is counter-intuitive, but is simply a proof that by "distributing the load", we avoid having one thread with an "easy part" to compress, then waiting for the second one with a "worse part" to handle.

As a consequence, a more clever way to "segment" input data would help to improve the issue.


Second, there are some limits which cannot be improved, in this case, the bus speed between CPU and main memory. On reaching 2GB/s, there is a kind of "wall speed" which cannot be crossed. This effect is not too large with "only" 2 threads, but on quad-core systems is likely to have an impact, especially for decoding speed. There is nothing which can be done to improve upon this. But hopefully, this is not that much of an issue, since working at "memory bus speed" is quite fast enough.

Anyway, here are the results for the current version :


versionCompression RatioSpeedDecoding
LZ4 "Ultra Fast"0.72.062232 MB/s805 MB/s
LZ4 "Ultra Fast" - 2 threads0.72.062430 MB/s1510 MB/s

Since i only own a 2-cores system, i can't really test for more threads. But the code allows for up to 32 threads. So feel free to test on your own rig.


You can download the new version here.

Tuesday, February 22, 2011

Huff0 : Early detection offers better speed (v0.7)

 In my earlier release, i skipped the integration of a simple to understand but more difficult to execute feature : early detection of border-cases segments.

This definition regroups both "not compressible segments" and "single character segments". The previous algorithm was correctly detecting these cases, but during the construction of the Huffman tree.

The new algorithm ensures these situations are detected before any operation is done to build the tree. It does so by counting differently, in a new structure created for this purpose. The net result is a speed improvement for files which feature such situations :

Detailed performance assessment :


Huff0 v0.6

Huff0 v0.7

RatioCompress DecodingRatioCompressDecoding
Not compressible





enwik8.7z1.000810 MB/s1.93 GB/s1.000870 MB/s1.93 GB/s
Hardly compressible





win98-lz4hc-lit1.024465 MB/s600 MB/s1.024485 MB/s600 MB/s
audio11.070285 MB/s280 MB/s1.070285 MB/s280 MB/s
Distributed





enwik8-lz4hc-lit1.290204 MB/s194 MB/s1.290204 MB/s194 MB/s
Lightly Ordered





enwik8-lz4hc-offh1.131197 MB/s184 MB/s1.131197 MB/s184 MB/s
Ordered





enwik8-lz4hc-ml2.309214 MB/s195 MB/s2.309214 MB/s195 MB/s
Squeezed





office-lz4hc-run3.152218 MB/s202 MB/s3.152218 MB/s202 MB/s
enwik8-lz4hc-run4.959245 MB/s224 MB/s4.959245 MB/s224 MB/s
Ultra compressible





loong275785 MB/s2.93 GB/s275860 MB/s2.93 GB/s



This (mostly) closes the gap with Range0 regarding the detection speed of not compressible segments, which is the main case to consider for real-life situations.

You can download and test the new version here :
http://fastcompression.blogspot.com/p/huff0-range0-entropy-coders.html

Sunday, February 20, 2011

Zhuff : new version v0.6

 Applying "sample" learnings to Zhuff, it results in a new version, v0.6, featuring a little compression improvement. The new sampling pattern improves compression, by 1 to 4% depending on file type. However, it also decreases compression speed, by about 5%. Therefore this is a matter of taste. On average, i believe it is a fair trade-off.

You can download it on the forum.