QuoteA better, faster and stronger spiritual successor to BZip2. Features higher compression ratios and better performance thanks to a order-0 context mixing entropy coder, a fast Burrows-Wheeler transform code making use of suffix arrays and a RLE with Lempel Ziv+Prediction pass based on LZ77-style string matching and PPM-style context modeling.
Like its ancestor, BZip3 excels at compressing text or code.
https://github.com/kspalaiologos/bzip3 (https://github.com/kspalaiologos/bzip3)
Thank you for the link, Erol. :thumbsup: Are there any advantages over 7zip?
Hi Gunther,
7-Zip is an archiver and can pack multiple files while the bzip family can process only one file at one time.
Erol,
Quote from: Vortex on January 30, 2023, 06:36:23 AM
7-Zip is an archiver and can pack multiple files while the bzip family can process only one file at one time.
thank you for the quick reply. I will have a look at the C sources anyway - as soon as I find time.
Hi Erol,
I made a quick test using the binary found here (https://encode.su/threads/3867-BZip3?p=74343&viewfull=1#post74343).
bzip3 compressed a 2.7MB source to 368,764 bytes
FreeArc compressed the same source to 373,802 bytes
So far, so good. But bzip3 took 630 milliseconds, while FreeArc needed only 422 milliseconds to do the job.
Had a quick play, file size reduction of win64.inc went from 786888 down to 108846 bytes.
Setting the thread count increases the speed of compression. On this old 6 core, --j 12 made it fast. Tested on an 800 meg video file.
Better compression than any option in Winrar.
Some encode.su forum members have already won the hutter prize.
Following the same dilemma, speed or better compression?
You can use wrt44.zip (preprocessor) with bzip3.
Zpaq has good speed but lower compression.
paq (Matt Mahoney) series has good compression but lower speed.
http://mattmahoney.net/dc/ (http://mattmahoney.net/dc/)
http://mattmahoney.net/dc/paq.html (http://mattmahoney.net/dc/paq.html)
$./paq8l -1 win64.inc
78215 bytes, Time 6.55 sec, used 38862968 bytes of memory
$./paq8l -8 win64.inc
61151 bytes, Time 44.14 sec, used 1643021320 bytes of memory
There are some versions of paq containing dictionaries that can compress more. We can create our own dictionary. Some mixing contents done to jpg,bmp,wav,... .
Jpeg used huffman in past because patents, now I see persons using arithmetic coding instead. Probably old mpeg,jpeg files can be compressed.
I made more tests. With option -m9, FreeArc has much better compression and is still significantly faster.
Hi Jochen,
Those versions compiled for Windows XP are outdated. You can get the latest release from here:
https://github.com/kspalaiologos/bzip3/releases (https://github.com/kspalaiologos/bzip3/releases)
The MSYS2 development system is providing a ready package for bzip3 :
# pacman -S mingw-w64-x86_64-bzip3
resolving dependencies...
looking for conflicting packages...
Packages (1) mingw-w64-x86_64-bzip3-1.2.2-1
Total Download Size: 0.16 MiB
Total Installed Size: 0.52 MiB
:: Proceed with installation? [Y/n] y
:: Retrieving packages...
mingw-w64-x86_64-bzip3-1.2... 159.6 KiB 93.8 KiB/s 00:02 [###############################] 100%
.
.
:: Processing package changes...
(1/1) installing mingw-w64-x86_64-bzip3 [###############################] 100%
Dependencies :
# ldd /mingw64/bin/bzip3.exe
ntdll.dll => /c/Windows/SYSTEM32/ntdll.dll (0x77970000)
kernel32.dll => /c/Windows/system32/kernel32.dll (0x77850000)
KERNELBASE.dll => /c/Windows/system32/KERNELBASE.dll (0x7fefd810000)
libbzip3-0.dll => /mingw64/bin/libbzip3-0.dll (0x7fefa540000)
msvcrt.dll => /c/Windows/system32/msvcrt.dll (0x7fefecb0000)
libwinpthread-1.dll => /mingw64/bin/libwinpthread-1.dll (0x7fefb0a0000)
Building a static binary with less dependencies :
https://github.com/kspalaiologos/bzip3/blob/master/PORTING.md (https://github.com/kspalaiologos/bzip3/blob/master/PORTING.md)
64-bit :
./configure CC=x86_64-w64-mingw32-gcc --host x86_64-w64-mingw32 --enable-static-exe
make
Dependencies :
# ldd ./bzip3.exe
ntdll.dll => /c/Windows/SYSTEM32/ntdll.dll (0x77970000)
kernel32.dll => /c/Windows/system32/kernel32.dll (0x77850000)
KERNELBASE.dll => /c/Windows/system32/KERNELBASE.dll (0x7fefd810000)
msvcrt.dll => /c/Windows/system32/msvcrt.dll (0x7fefecb0000)
32-bit :
./configure CC=i686-w64-mingw32 --host i686-w64-mingw32 --enable-static-exe
make
Dependencies :
# ldd ./bzip3.exe
ntdll.dll => /c/Windows/SYSTEM32/ntdll.dll (0x77970000)
ntdll.dll => /c/Windows/SysWOW64/ntdll.dll (0x77b30000)
wow64.dll => /c/Windows/SYSTEM32/wow64.dll (0x74e80000)
wow64win.dll => /c/Windows/SYSTEM32/wow64win.dll (0x74e20000)
wow64cpu.dll => /c/Windows/SYSTEM32/wow64cpu.dll (0x74e10000)
Apart from testing the pre-built binaries which work OK, is there a 64 bit COFF library module that can be linked into a 64 bit app ?
Quote from: Vortex on January 31, 2023, 06:36:14 AM
Hi Jochen,
Those versions compiled for Windows XP are outdated. You can get the latest release from here:
https://github.com/kspalaiologos/bzip3/releases
Hi Erol,
There are a dozen archives, each with a dozen strange files. I won't download them all, but if you know which one has a functioning executable, please let me know. I found one functioning executable so far, not on that site (see above), and its performance was really disappointing, especially given the hype that I see on GitHub.
Quote from: hutch-- on January 30, 2023, 10:31:56 AM
Had a quick play, file size reduction of win64.inc went from 786888 down to 108846 bytes.
Quote136 ms for decompressing with bzip3
FreeArc 0.666 extracting archive: w64.arc
Extracted 1 file, 96,807 => 752,296 bytes. Ratio 12.8%
Extraction time: cpu 0.02 secs, real 0.02 secs. Speed 37,615 kB/s
All OK
120 ms for decompressing with FreeArc
See attachment. FreeArc is here (https://freearc.en.softonic.com/).
Quote from: hutch-- on January 31, 2023, 08:47:22 AM
Apart from testing the pre-built binaries which work OK, is there a 64 bit COFF library module that can be linked into a 64 bit app ?
here are msvc 2022 compiled obj-files for testing.
Nice, Timo :thumbsup:
Now all we need is a bit of documentation :biggrin:
from main.c
if (mode == MODE_ENCODE) {
s32 read_count;
while (!feof(input_des)) {
read_count = xread(buffer, 1, block_size, input_des);
bytes_read += read_count;
if(read_count == 0)
break;
s32 new_size = bz3_encode_block(state, buffer, read_count);
if (new_size == -1) {
fprintf(stderr, "Failed to encode a block: %s\n", bz3_strerror(state));
return 1;
}
write_neutral_s32(byteswap_buf, new_size);
xwrite(byteswap_buf, 4, 1, output_des);
write_neutral_s32(byteswap_buf, read_count);
xwrite(byteswap_buf, 4, 1, output_des);
xwrite(buffer, new_size, 1, output_des);
bytes_written += 8 + new_size;
}
fflush(output_des);
} else if (mode == MODE_DECODE) {
s32 new_size, old_size;
while (!feof(input_des)) {
if (!xread_eofcheck(&byteswap_buf, 1, 4, input_des)) continue;
new_size = read_neutral_s32(byteswap_buf);
xread_noeof(&byteswap_buf, 1, 4, input_des);
old_size = read_neutral_s32(byteswap_buf);
xread_noeof(buffer, 1, new_size, input_des);
bytes_read += 8 + new_size;
if (bz3_decode_block(state, buffer, new_size, old_size) == -1) {
fprintf(stderr, "Failed to decode a block: %s\n", bz3_strerror(state));
return 1;
}
xwrite(buffer, old_size, 1, output_des);
bytes_written += old_size;
}
fflush(output_des);
}
decompress-file.c
/* Decompress a file SEQUENTIALLY (i.e. *not* in parallel) using bzip3 high level API. */
/* This is just a demonstration of bzip3 library usage, it does not contain all the necessary error checks and will not
* support cross-endian encoding/decoding. */
#include <libbz3.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char ** argv) {
if (argc != 3) {
printf("Usage: %s <input file> <output file>");
return 1;
}
// Read the entire input file to memory:
FILE * fp = fopen(argv[1], "rb");
fseek(fp, 0, SEEK_END);
size_t size = ftell(fp);
fseek(fp, 0, SEEK_SET);
uint8_t * buffer = malloc(size);
fread(buffer, 1, size, fp);
fclose(fp);
// Decompress the file:
size_t orig_size = *(size_t *)buffer;
uint8_t * outbuf = malloc(orig_size);
int bzerr = bz3_decompress(buffer + sizeof(size_t), outbuf, size - sizeof(size_t), &orig_size);
if (bzerr != BZ3_OK) {
printf("bz3_decompress() failed with error code %d", bzerr);
return 1;
}
FILE * outfp = fopen(argv[2], "wb");
fwrite(outbuf, 1, orig_size, outfp);
fclose(outfp);
printf("OK, %d => %d", size, orig_size);
return 0;
}
hl-api.c
#include <libbz3.h>
#include <stdio.h>
#include <stdlib.h>
#define MB (1024 * 1024)
int main(void) {
printf("Compressing shakespeare.txt back and forth in memory.\n");
// Read the entire "shakespeare.txt" file to memory:
FILE * fp = fopen("shakespeare.txt", "rb");
fseek(fp, 0, SEEK_END);
size_t size = ftell(fp);
fseek(fp, 0, SEEK_SET);
char * buffer = malloc(size);
fread(buffer, 1, size, fp);
fclose(fp);
// Compress the file:
size_t out_size = bz3_bound(size);
char * outbuf = malloc(out_size);
int bzerr = bz3_compress(1 * MB, buffer, outbuf, size, &out_size);
if (bzerr != BZ3_OK) {
printf("bz3_compress() failed with error code %d", bzerr);
return 1;
}
printf("%d => %d\n", size, out_size);
// Decompress the file.
bzerr = bz3_decompress(outbuf, buffer, out_size, &size);
if (bzerr != BZ3_OK) {
printf("bz3_decompress() failed with error code %d", bzerr);
return 1;
}
printf("%d => %d\n", out_size, size);
free(buffer);
free(outbuf);
return 0;
}
tests done in linux x86-64
compress and decompress in memory: test.asm
.X64
;this program test compression and decompression in memory
include ./../inc/c.inc
include ./../inc/glib.inc
include ./../inc/macros.inc
comprima proto :gpointer,:guint64
decompress proto :gpointer,:guint64
include libbz3.INC
.data?
align 16
p_output_name dq ?
.code
main proc uses rbx r12 r13 r14 r15 _argc:dword,_argv:ptr
local argc:guint32
local argv:gpointer
local error:ptr _GError
local p_input:qword
local input_sz:qword
;-------------------------------------------------------------
;FUNCTION PARAMETERS
;-------------------------------------------------------------
mov argc,_argc
mov argv,_argv
;-------------------------------------------------------------
;OPEN FILE
;-------------------------------------------------------------
.if argc == 1
invoke printf,CStr("Usage: test c uncompressed compressed",10)
.elseif argc == 4
mov rax,argv
add rax,8
mov rax,[rax]
movzx rax,byte ptr [rax]
.if rax == "c"
mov rbx,argv
add rbx,8*3
mov rbx,[rbx]
mov p_output_name,rbx
mov rbx,argv
add rbx,16
mov rbx,[rbx]
invoke g_get_current_dir
invoke g_build_filename,rax,rbx,NULL
mov error,0
invoke g_file_get_contents,rax,addr p_input,addr input_sz,addr error
.if eax == FALSE
mov rax,error
invoke printf,CStr("%s",10),[rax]._GError.message
invoke exit,1
.endif
invoke comprima,p_input,input_sz
.endif
.endif
invoke exit,0
main endp
align 16
comprima proc uses rbx r12 r13 r14 r15 _pointer:qword,_ssize:guint64
local p_input:ptr ;input file
local sz_input:guint64
local p_output:ptr ;compressed file in memory
local sz_output:guint64
local p_descompresso:ptr ;decompressed file in memory
local sz_descompresso:guint64
local error:ptr _GError
mov p_input,_pointer
mov sz_input,_ssize
invoke bz3_version ;check bz2 version
invoke printf,CStr("bzip3 version: %s",10),rax
mov rdi,sz_input
call bz3_bound ;get necessary memory
mov sz_output,rax
invoke g_malloc0,sz_output
mov p_output,rax
;p_input == uncompressed file ptr
;p_output == compressed file ptr
;sz_input == uncompressed file size
;sz_output == compressed file size
invoke bz3_compress,8*1024*1024,p_input,p_output,sz_input,addr sz_output
.if rax != BZ3_OK
invoke printf,CStr("compression failed with error code %d",10),rax
invoke exit,1
.endif
invoke printf,CStr("not compressed: %d",09,"compressed: %d",10),sz_input,sz_output
invoke g_malloc0,sz_input
mov p_descompresso,rax
mov rax,sz_input
mov sz_descompresso,rax
invoke bz3_decompress,p_output,p_descompresso,sz_output,addr sz_descompresso
.if rax != BZ3_OK
invoke printf,CStr("decompression failed with error code %d",10),rax
invoke exit,1
.endif
invoke printf,CStr("compressed: %d",09,"decompressed: %d",10),sz_output,sz_descompresso
ret
comprima endp
end main
this create a compatible bzip3 file to compress and decompress
edited, erased, don't work as expected. My fault.
this is the header file libbz3.h to be converted to libbz3.inc
/*
* BZip3 - A spiritual successor to BZip2.
* Copyright (C) 2022 Kamila Szewczyk
*
* This program is free software: you can redistribute it and/or modify it
* under the terms of the GNU Lesser General Public License as published by the Free
* Software Foundation, either version 3 of the License, or (at your option)
* any later version.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU Lesser General Public License along with
* this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _LIBBZ3_H
#define _LIBBZ3_H
#include <stddef.h>
#include <stdint.h>
/* Symbol visibility control. */
#ifndef BZIP3_VISIBLE
#if defined(__GNUC__) && (__GNUC__ >= 4) && !defined(__MINGW32__)
#define BZIP3_VISIBLE __attribute__((visibility("default")))
#else
#define BZIP3_VISIBLE
#endif
#endif
#if defined(BZIP3_DLL_EXPORT) && (BZIP3_DLL_EXPORT == 1)
#define BZIP3_API __declspec(dllexport) BZIP3_VISIBLE
#elif defined(BZIP3_DLL_IMPORT) && (BZIP3_DLL_IMPORT == 1)
#define BZIP3_API __declspec(dllimport) BZIP3_VISIBLE
#else
#define BZIP3_API BZIP3_VISIBLE
#endif
#define BZ3_OK 0
#define BZ3_ERR_OUT_OF_BOUNDS -1
#define BZ3_ERR_BWT -2
#define BZ3_ERR_CRC -3
#define BZ3_ERR_MALFORMED_HEADER -4
#define BZ3_ERR_TRUNCATED_DATA -5
#define BZ3_ERR_DATA_TOO_BIG -6
#define BZ3_ERR_INIT -7
struct bz3_state;
/**
* @brief Get bzip3 version.
*/
BZIP3_API const char * bz3_version(void);
/**
* @brief Get the last error number associated with a given state.
*/
BZIP3_API int8_t bz3_last_error(struct bz3_state * state);
/**
* @brief Return a user-readable message explaining the cause of the last error.
*/
BZIP3_API const char * bz3_strerror(struct bz3_state * state);
/**
* @brief Construct a new block encoder state, which will encode blocks as big as the given block size.
* The decoder will be able to decode blocks at most as big as the given block size.
* Returns NULL in case allocation fails or the block size is not between 65K and 511M
*/
BZIP3_API struct bz3_state * bz3_new(int32_t block_size);
/**
* @brief Free the memory occupied by a block encoder state.
*/
BZIP3_API void bz3_free(struct bz3_state * state);
/**
* @brief Return the recommended size of the output buffer for the compression functions.
*/
BZIP3_API size_t bz3_bound(size_t input_size);
/* ** HIGH LEVEL APIs ** */
/**
* @brief Compress a block of data. This function does not support parallelism
* by itself, consider using the low level `bz3_encode_blocks()` function instead.
* Using the low level API might provide better performance.
* Returns a bzip3 error code; BZ3_OK when the operation is successful.
* Make sure to set out_size to the size of the output buffer before the operation;
* out_size must be at least equal to `bz3_bound(in_size)'.
*/
BZIP3_API int bz3_compress(uint32_t block_size, const uint8_t * in, uint8_t * out, size_t in_size, size_t * out_size);
/**
* @brief Decompress a block of data. This function does not support parallelism
* by itself, consider using the low level `bz3_decode_blocks()` function instead.
* Using the low level API might provide better performance.
* Returns a bzip3 error code; BZ3_OK when the operation is successful.
* Make sure to set out_size to the size of the output buffer before the operation.
*/
BZIP3_API int bz3_decompress(const uint8_t * in, uint8_t * out, size_t in_size, size_t * out_size);
/* ** LOW LEVEL APIs ** */
/**
* @brief Encode a single block. Returns the amount of bytes written to `buffer'.
* `buffer' must be able to hold at least `bz3_bound(size)' bytes. The size must not
* exceed the block size associated with the state.
*/
BZIP3_API int32_t bz3_encode_block(struct bz3_state * state, uint8_t * buffer, int32_t size);
/**
* @brief Decode a single block.
* `buffer' must be able to hold at least `orig_size' bytes. The size must not exceed the block size
* associated with the state.
* @param size The size of the compressed data in `buffer'
* @param orig_size The original size of the data before compression.
*/
BZIP3_API int32_t bz3_decode_block(struct bz3_state * state, uint8_t * buffer, int32_t size, int32_t orig_size);
/**
* @brief Encode `n' blocks, all in parallel.
* All specifics of the `bz3_encode_block' still hold. The function will launch a thread for each block.
* The compressed sizes are written to the `sizes' array. Every buffer is overwritten and none of them can overlap.
* Precisely `n' states, buffers and sizes must be supplied.
*
* Expects `n' between 2 and 16.
*
* Present in the shared library only if -lpthread was present during building.
*/
BZIP3_API void bz3_encode_blocks(struct bz3_state * states[], uint8_t * buffers[], int32_t sizes[], int32_t n);
/**
* @brief Decode `n' blocks, all in parallel.
* Same specifics as `bz3_encode_blocks', but doesn't overwrite `sizes'.
*/
BZIP3_API void bz3_decode_blocks(struct bz3_state * states[], uint8_t * buffers[], int32_t sizes[],
int32_t orig_sizes[], int32_t n);
#endif
Hi Jochen,
QuoteBzip3's performance is heavily dependent on the compiler. x64 Linux clang13 builds usually can go as high as 17MiB/s compression and 23MiB/s decompression per thread. Windows and 32-bit builds might be considerably slower.
https://github.com/kspalaiologos/bzip3 (https://github.com/kspalaiologos/bzip3)
Attached zip file including the 32-bit and 64-bit binaries built with Msys2.
Jotti's report :
https://virusscan.jotti.org/en-US/filescanjob/jpi2ix4ovi (https://virusscan.jotti.org/en-US/filescanjob/jpi2ix4ovi)
Please rename the extension of the attachment to .7z :
ren bzip3-Msys2.zip bzip3-Msys2.7z
The original zip compression is exceeding the limit of the maximum file size, 500 Kb
Quote from: Vortex on February 01, 2023, 04:47:50 AMAttached zip file including the 32-bit and 64-bit binaries built with Msys2.
Hi Erol,
bzip3-64bit.exe is indeed significantly faster than FreeArc :thumbsup:
The compression ratio for Win64.inc (752,296 bytes), however, is mediocre:
97066 Win64.arc
104168 Win64.bz3Another test where FreeArc shines:
5247 ms for bzip3-64: 3656710 01.02.2023 00:58:34 Bible20.bz3
5773 ms for bzip3-32: 3656710 01.02.2023 00:58:40 Bible20.bz3
Compressing 1 file, 83,986,682 bytes. Processed
2696 ms for compressing with FreeArc
692568 16.05.2016 18:21:18 Bible20.arc
An Excel database:
8934 ms for bzip3-64: 3506850 01.02.2023 01:10:36 MDG_Database.bz3
10126 ms for bzip3-32: 3506850 01.02.2023 01:10:46 MDG_Database.bz3
Compressing 1 file, 42,340,864 bytes. Processed
9976 ms for compressing with FreeArc
1754309 04.05.2015 13:08:39 MDG_Database.arc