News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

SMAC : Compression with Arithmetic Coding+Context Mixing

Started by Jean-Marie, November 29, 2013, 09:15:28 PM

Previous topic - Next topic

jj2007

Quote from: Jean-Marie on November 30, 2013, 07:53:09 PM
@JJ2007: looks like you restored the registers in the wrong order :

Yep, that's right :t

I had erroneously assumed that Tasm is one of those intelligent assemblers that use the same order for multiple pushs & pops... :redface:

Correct version attached. Nonetheless, even your original exe fails...

qWord

At least for Win7-x64 it does work when specifying the linker option /LARGEADDRESSAWARE (the process allocates more than 2GB). IIRC, win32 systems must be configured to be "LARGEADDRESSAWARE" (which means processes can allocate up to ~3GB).
MREAL macros - when you need floating point arithmetic while assembling!

TWell

@Jean-Marie
Is this really nesessary to do?
;Allocate 1Gb of Memory for Table of Bit History
invoke GlobalAlloc,40H,40000000H

jj2007

Quote from: qWord on November 30, 2013, 11:51:17 PM
At least for Win7-x64 it does work when specifying the linker option /LARGEADDRESSAWARE (the process allocates more than 2GB). IIRC, win32 systems must be configured to be "LARGEADDRESSAWARE" (which means processes can allocate up to ~3GB).

Thanks, qWord. I tried that but no success. My system has 2 GB of RAM...

qWord

Quote from: jj2007 on December 01, 2013, 12:33:28 AMMy system has 2 GB of RAM...
actual his program allocates 1.5xGB, but the allocated memory cross the 2GB boarder - with the linker option you tell windows that your program can handle addresses above 2GB.
Regardless that, it may be time to upgrade your system :-D
MREAL macros - when you need floating point arithmetic while assembling!

Jean-Marie

@Twell : 1 Gb is a lot of memory I agree. We have to compute hashes for contexts of Order-6, Order-4 and Order-3 on each bit encoding (ie 6, 4, and 3 previous bytes). The problem with hash tables is collisions (different contexts will have the same hash). A big hash table will reduce the number of collisions, so the compression will be better. Note that collisions won't cause the program to crash, but will hurt the compression.
Using 512 Kb for small programs could be sufficent though, that's an idea.
Besides hash tables, there is another method using Binary trees, described by Mark Nelson in his tutorial, but it is too complicated for me!

TWell

Joke:
If programmer can't fit program to 2Gb, better to go marketing department.
Most of those are already there :badgrin:

dedndave

rather than trying to compress the entire file, you might consider
compressing it in "blocks" - say 64 KB or something

jj2007

Quote from: qWord on December 01, 2013, 12:39:54 AMit may be time to upgrade your system :-D

I'd like to but the motherboard doesn't permit more than 2GB.

Attached a version that works with less memory. Speed could be improved, but the compression ratio is WOW! For a typical big *.inc file, it beats FreeArc hands down :t
However, FreeArc wins with executables.

The attached version has timings by "step" (search the source for NanoTimer - it's the QPC):
smacking D:\Masm32\PellesC\Bin\poide.exe
Please Wait...
Step A took 238 ms
Step B took 2 ms
Step C took 11294 ms
Step D took 12 ms

Quote from: Jean-Marie on December 01, 2013, 12:49:07 AM
Using 512 Kb for small programs could be sufficent though, that's an idea.

It seems the same memory is required for compression and decompression - right?

TWell

here is POAsm version with Jochen changes + read file buffer change in line 116.

- Timppa


Jean-Marie

Yep, same memory is used for compression & decompression.
SMAC splits the file in blocks of 256 Mb. This limits the number of Read/Write operation with huge files.
With the first versions, I used smaller blocks but I had to consider larger blocks for compressing enwik9, which is 1 billion of bits.

Jean-Marie

I'm replacing the Squash function by Qword's one, but I have some trouble to convert the Double in xmm0 to an integer. The problem is that CVTSD2SI will convert to signed integers, so big values like 4294901760 will return indefinite integer value 80000000H, rather than FFFF0000H.   :redface:


@@Squash:
fstp qword ptr Temp
lea edi,Ranges.ModelStart
movsd xmm0,Temp ;xmm0=weighted average
movsd xmm1,@@minus16
movsd xmm2,@@16
comisd xmm0,xmm1 ;test if p<-15.999988
jb @@SQ2 ;if so, bound it
comisd xmm0,xmm2 ;test if p>15.999988
ja @@SQ3 ;if so, bound it
@@SQ1: movaps xmm1,xmm0
addsd xmm0,@@17
addsd xmm0,xmm0
cvttsd2si ecx,xmm0
imul ecx,48
lea ecx,@@tbl1[ecx]
movsd xmm0,[ecx+0*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+1*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+2*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+3*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+4*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+5*8] ;xmm0=2^32/(1+2^-p)
BUG: cvtsd2si eax,xmm0 ;eax=bit 0 upper limit
mov dword ptr [edi+4],-1 ;Upper limit of bit 1 always FFFFFFFFH
mov [edi],eax ;store bit 0 upper limit

Jean-Marie

By the way, I forgot to mention that SMAC is ranked in the Large Text Compression Benchmark, and in the Silesia Corpus !
http://mattmahoney.net/dc/text.html#1834
http://mattmahoney.net/dc/silesia.html


jj2007

@@Stretch2 is worth a try, as it gets called really often, but it seems you have already invested a lot of efforts to make it fast.

Jean-Marie

@jj2007 : yes, this is definitely a hotspot, as it is called 6 times by bit-encoding. Every little change can be measured in terms of seconds when compressing a big file!