Author Topic: SMAC : Compression with Arithmetic Coding+Context Mixing  (Read 19173 times)

jj2007

  • Member
  • *****
  • Posts: 11450
  • Assembler is fun ;-)
    • MasmBasic
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #15 on: November 30, 2013, 08:57:43 PM »
@JJ2007: looks like you restored the registers in the wrong order :

Yep, that's right :t

I had erroneously assumed that Tasm is one of those intelligent assemblers that use the same order for multiple pushs & pops... :redface:

Correct version attached. Nonetheless, even your original exe fails...

qWord

  • Member
  • *****
  • Posts: 1476
  • The base type of a type is the type itself
    • SmplMath macros
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #16 on: November 30, 2013, 11:51:17 PM »
At least for Win7-x64 it does work when specifying the linker option /LARGEADDRESSAWARE (the process allocates more than 2GB). IIRC, win32 systems must be configured to be "LARGEADDRESSAWARE" (which means processes can allocate up to ~3GB).
MREAL macros - when you need floating point arithmetic while assembling!

TWell

  • Member
  • ****
  • Posts: 748
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #17 on: December 01, 2013, 12:11:33 AM »
@Jean-Marie
Is this really nesessary to do?
Code: [Select]
;Allocate 1Gb of Memory for Table of Bit History
invoke GlobalAlloc,40H,40000000H

jj2007

  • Member
  • *****
  • Posts: 11450
  • Assembler is fun ;-)
    • MasmBasic
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #18 on: December 01, 2013, 12:33:28 AM »
At least for Win7-x64 it does work when specifying the linker option /LARGEADDRESSAWARE (the process allocates more than 2GB). IIRC, win32 systems must be configured to be "LARGEADDRESSAWARE" (which means processes can allocate up to ~3GB).

Thanks, qWord. I tried that but no success. My system has 2 GB of RAM...

qWord

  • Member
  • *****
  • Posts: 1476
  • The base type of a type is the type itself
    • SmplMath macros
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #19 on: December 01, 2013, 12:39:54 AM »
My system has 2 GB of RAM...
actual his program allocates 1.5xGB, but the allocated memory cross the 2GB boarder - with the linker option you tell windows that your program can handle addresses above 2GB.
Regardless that, it may be time to upgrade your system :-D
MREAL macros - when you need floating point arithmetic while assembling!

Jean-Marie

  • Regular Member
  • *
  • Posts: 17
  • Waiting for TASM 6.0
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #20 on: December 01, 2013, 12:49:07 AM »
@Twell : 1 Gb is a lot of memory I agree. We have to compute hashes for contexts of Order-6, Order-4 and Order-3 on each bit encoding (ie 6, 4, and 3 previous bytes). The problem with hash tables is collisions (different contexts will have the same hash). A big hash table will reduce the number of collisions, so the compression will be better. Note that collisions won't cause the program to crash, but will hurt the compression.
Using 512 Kb for small programs could be sufficent though, that's an idea.
Besides hash tables, there is another method using Binary trees, described by Mark Nelson in his tutorial, but it is too complicated for me!

TWell

  • Member
  • ****
  • Posts: 748
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #21 on: December 01, 2013, 12:51:54 AM »
Joke:
If programmer can't fit program to 2Gb, better to go marketing department.
Most of those are already there :badgrin:

dedndave

  • Member
  • *****
  • Posts: 8829
  • Still using Abacus 2.0
    • DednDave
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #22 on: December 01, 2013, 03:17:22 AM »
rather than trying to compress the entire file, you might consider
compressing it in "blocks" - say 64 KB or something

jj2007

  • Member
  • *****
  • Posts: 11450
  • Assembler is fun ;-)
    • MasmBasic
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #23 on: December 01, 2013, 04:41:29 AM »
it may be time to upgrade your system :-D

I'd like to but the motherboard doesn't permit more than 2GB.

Attached a version that works with less memory. Speed could be improved, but the compression ratio is WOW! For a typical big *.inc file, it beats FreeArc hands down :t
However, FreeArc wins with executables.

The attached version has timings by "step" (search the source for NanoTimer - it's the QPC):
smacking D:\Masm32\PellesC\Bin\poide.exe
Please Wait...
Step A took 238 ms
Step B took 2 ms
Step C took 11294 ms
Step D took 12 ms

Using 512 Kb for small programs could be sufficent though, that's an idea.

It seems the same memory is required for compression and decompression - right?

TWell

  • Member
  • ****
  • Posts: 748
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #24 on: December 01, 2013, 05:32:46 AM »
here is POAsm version with Jochen changes + read file buffer change in line 116.

- Timppa


Jean-Marie

  • Regular Member
  • *
  • Posts: 17
  • Waiting for TASM 6.0
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #25 on: December 01, 2013, 10:33:08 AM »
Yep, same memory is used for compression & decompression.
SMAC splits the file in blocks of 256 Mb. This limits the number of Read/Write operation with huge files.
With the first versions, I used smaller blocks but I had to consider larger blocks for compressing enwik9, which is 1 billion of bits.

Jean-Marie

  • Regular Member
  • *
  • Posts: 17
  • Waiting for TASM 6.0
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #26 on: December 01, 2013, 12:15:40 PM »
I'm replacing the Squash function by Qword's one, but I have some trouble to convert the Double in xmm0 to an integer. The problem is that CVTSD2SI will convert to signed integers, so big values like 4294901760 will return indefinite integer value 80000000H, rather than FFFF0000H.   :redface:

Code: [Select]
@@Squash:
fstp qword ptr Temp
lea edi,Ranges.ModelStart
movsd xmm0,Temp ;xmm0=weighted average
movsd xmm1,@@minus16
movsd xmm2,@@16
comisd xmm0,xmm1 ;test if p<-15.999988
jb @@SQ2 ;if so, bound it
comisd xmm0,xmm2 ;test if p>15.999988
ja @@SQ3 ;if so, bound it
@@SQ1: movaps xmm1,xmm0
addsd xmm0,@@17
addsd xmm0,xmm0
cvttsd2si ecx,xmm0
imul ecx,48
lea ecx,@@tbl1[ecx]
movsd xmm0,[ecx+0*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+1*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+2*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+3*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+4*8]
mulsd xmm0,xmm1
addsd xmm0,[ecx+5*8] ;xmm0=2^32/(1+2^-p)
BUG: cvtsd2si eax,xmm0 ;eax=bit 0 upper limit
mov dword ptr [edi+4],-1 ;Upper limit of bit 1 always FFFFFFFFH
mov [edi],eax ;store bit 0 upper limit

Jean-Marie

  • Regular Member
  • *
  • Posts: 17
  • Waiting for TASM 6.0
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #27 on: December 01, 2013, 12:29:06 PM »
By the way, I forgot to mention that SMAC is ranked in the Large Text Compression Benchmark, and in the Silesia Corpus !
http://mattmahoney.net/dc/text.html#1834
http://mattmahoney.net/dc/silesia.html


jj2007

  • Member
  • *****
  • Posts: 11450
  • Assembler is fun ;-)
    • MasmBasic
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #28 on: December 01, 2013, 10:51:42 PM »
@@Stretch2 is worth a try, as it gets called really often, but it seems you have already invested a lot of efforts to make it fast.

Jean-Marie

  • Regular Member
  • *
  • Posts: 17
  • Waiting for TASM 6.0
Re: SMAC : Compression with Arithmetic Coding+Context Mixing
« Reply #29 on: December 01, 2013, 11:24:23 PM »
@jj2007 : yes, this is definitely a hotspot, as it is called 6 times by bit-encoding. Every little change can be measured in terms of seconds when compressing a big file!