News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

SMAC : Compression with Arithmetic Coding+Context Mixing

Started by Jean-Marie, November 29, 2013, 09:15:28 PM

Previous topic - Next topic

qWord

Quote from: Jean-Marie on December 01, 2013, 12:15:40 PM
I'm replacing the Squash function by Qword's one, but I have some trouble to convert the Double in xmm0 to an integer.
yepp, only SDWORDs are supported for x32 - it may be possible to change the type in ADAPTIVEMODEL to REAL8? Even these integer <-> FP conversions are generally "slow"  and should be avoided as far as possible.
MREAL macros - when you need floating point arithmetic while assembling!

dedndave

somewhere, i saw a snippet that used a shuffle to get it done

Jean-Marie

#32
Sadly, we need integer versions of Upper & Lower limits since HighRange, LowRange & Ranges are kept as integers.
Once the most significant byte in High & Low matches, we output it as an encoded byte in our destination buffer.

I'm going to test a quick & dirty solution for the moment till I find something better.
Like:
movsd Temp,xmm0
fld Temp
Fistp qword ptr [edi]

Jean-Marie

Well that's impressive : the compression ratio is practically the same. Only 2 bytes are lost on enwik8, which is 100 000 000 bytes long! I clocked a gain of nearly 30 seconds on the same file. And the code can still be improved.
Much obliged, pals  :eusa_dance:


@@Squash:
fstp qword ptr Temp
lea edi,Ranges.ModelStart
movsd xmm0,Temp ;xmm0=weighted average
movsd xmm1,@@minus16
movsd xmm2,@@plus16
comisd xmm0,xmm1 ;test if p<-15.999988
jb @@SQ2 ;if so, bound it
comisd xmm0,xmm2 ;test if p>15.999988
ja @@SQ3 ;if so, bound it
@@SQ1: movaps xmm1,xmm0
addsd xmm0,@@16
addsd xmm0,xmm0
cvttsd2si ecx,xmm0
imul ecx,48
lea ecx,@@tbl1[ecx]
movsd xmm0,[ecx+0*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+1*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+2*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+3*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+4*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+5*8] ;xmm0=2^32/(1+2^-p)
movsd Temp,xmm0 ;st=bit 0 upper limit
fld Temp
fistp qword ptr [edi] ;store bit 0 upper limit
mov dword ptr [edi+4],-1 ;Upper limit of bit 1 always FFFFFFFFH
...
@@SQ2: movaps xmm0,xmm1 ;p=-15.999988
jmp @@SQ1
@@SQ3: movaps xmm0,xmm2 ;p=+15.999988
jmp @@SQ1
INCLUDE SQUASHTB.INC
@@TwoPower32 DQ 4294967296.0 ;2^32 (float)
@@16 DQ 16.0
@@plus16 DQ 15.999988
@@minus16 DQ -15.999988

qWord

Comparing your version from the first post with the latest, the speed grows about 6% (Intel i7QM, file=enwik8).  :t

Are the compares -16>x>16 really needed? Also, why 15.999988 and not 16?

regards, qWord
MREAL macros - when you need floating point arithmetic while assembling!

Jean-Marie

I have been quite inaccurate in my post #2. Actually, p generally ranges from -15,999978 to +15,999978 (cf enclosed picture). But in some rare cases, because the weights are unbounded, p might fall out from this range so we need to correct it.
The objective is to have a value of Squash(p) between [10000H; FFFF0000H], otherwise the calculation of the ranges would be subject to an underflow or overflow situation.
This is why I used those weird decimals : I calculated that Squash(-15.99998) returns 10000H, and Squash(+15.99998) returns FFFF0000H.
Squash(-16) returns FFFFH, and Squash(+16) returns a NAN, but may be this can be corrected by slightly modifying the tables.

Out of curiosity, qWord, what are the steps to transform the Squash formula in (Taylor?) polynomials approximation?
I'm wondering if a step of 1 would be sufficient.

qWord

Quote from: Jean-Marie on December 03, 2013, 10:24:51 AMOut of curiosity, qWord, what are the steps to transform the Squash formula in (Taylor?) polynomials approximation?
I'm wondering if a step of 1 would be sufficient.
For the polynomials I use the minimax approximation, which minimize the maximal error for a specific range. It would also work with Taylor series, but that require higher degree polynomials and/or finer intervals to get the same result.
In the attachment some more tables (range -16<=x<=16, degree 3, 5 and 7; step 1/2, 1 and 2). The maximal relative error can be found at top of each file.
 
MREAL macros - when you need floating point arithmetic while assembling!

Jean-Marie

Thanks a bunch qWord : I'm gonna look into all this  :icon14: