Compression Support
Compression Support
The major problem in supporting compressed files is random access, FSO needs to be able to do all operations on a compressed file as it would normally do with any file; fopen(), fseek(), fread(), fclose(). Whiout having to decompress the entire file and store it in ram to do those operations. .pof parsing specially uses a lot of fseeks() and small freads(). Generally, file compression algos are intended to decompress the entire file from start to finish in one go and thus random access its non standart for the compression format.
In FSO 21.2.0 the first compression support was merged into the code, the first implementation uses LZ4/LZ4-HC and it is called internally as "LZ41". The LZ41 implementation is mainly intended for loose files but adding those inside VPs is also supported.
LZ41
The LZ41 design is similar to the lz4 random access example, it uses the lz4 stream compression, a int table to store the offsets that indicate at what compressed block the original file position is, a int to indicate the number of these offsets, the original filesize and blocksize. The major diference being that instead of using a dictionary, the stream is resetted at each block, making them independient from each other, and thus you can pick any block you want and uncompress that block at will. Resetting the stream rather than use a dictionary may be less efficient (this is untested as the time of writting this article), but it was the only way to do it for individual files whiout having to embed the dictionary too. The LZ41 implementation is non-LZ4 format standard, just like the official example for random access is, as the LZ4 data format does not seems to consider what is needed for random access.
LZ41 Data Format
4 byte header N Blocks N offsets int num_offsets int original_filesize int blocksize
LZ41 Tests
Uncompressed MVP 4.4.1 12.9GB - Space used on Disk
Compressed MVP 4.4.1 Block Size: 65536 LZ4-HC L6 7.01GB - Space used on disk Compression time: 5 minutes (1 thread)
MVP 4.4.1 - All FS2 Models mission load time 0:50 - NVME Uncompressed 0:50 - NVME Compressed LZ4-HC L6, 7.01GB, BS: 65536 1:01 - SSD Compressed LZ4-HC L6, 7.01GB, BS: 65536 1:02 - SSD Uncompressed 1:49 - HDD Compressed LZ4-HC L6, 7.01GB, BS: 65536 2:30 - HDD Uncompressed
LZ41 Compressor Guidelines
Block Size Block size is set by the compressor 8192, 16384, 65536 block sizes were all tested, higher values are untested and may work or not, a higher block size will make the large compressed files smaller, higher block size will also make FSO to use more RAM but not by much.
Ignore text based file formats Text files should be ignored, they do work fine, but there is no much is to be gained and the only thing it does is to make the file unreadeable by text editors: .fc2, .fs2, .tbl, .tbm, .eff.
Movies are already compressed .mp4,.ogg,.mve,.webm: the ratio will be negative most of the time, and when is not, you are just saving a few kb, maybe a mb or two.
Audio files .ogg, .wav: .wav may give you some little gains but it is just not worth to try, with ogg its the same problem as with the movies, these files are already compressed.
pngs pngs are already compressed and 99% of the time end up with a negative ratio, and since mods use apngs, you are going to waste time trying to compress them. PNG that arent compressed are going to go down in size, considerably, but this is uncommon, maybe check the png header for the compression value they already have before trying to compress.