News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

How difficult is it to build a shrinking/deflating routine?

Started by frktons, December 15, 2012, 10:58:49 AM

Previous topic - Next topic

MichaelW

For that 1202 byte data block Winzip can compress it to 523 bytes (in a 641-byte archive), and RtlCompressBuffer can compress it to 666 bytes at standard compression and 659 bytes at maximum compression. I think run-length encoding will not compress it enough to be worth the effort.

In the Data Compression Handbook, Second Edition, Mark Nelson and Jean-Loup Gailly, 1996, M&T books, ISBN 1-55851-434-1, order 3 arithmetic coding produced the highest compression ratio for text files, as well as the highest overall ratio across text, graphics, and executable files, tested against Huffman, Adaptive Huffman, and several variants each of LZSS and LZW.

Well Microsoft, here's another nice mess you've gotten us into.

Magnum

I have some code that has the ability to zip/unzip directly in memory without any intermediate files.

Is that something that would be of help to you ?

Andy
Take care,
                   Andy

Ubuntu-mate-18.04-desktop-amd64

http://www.goodnewsnetwork.org

jj2007

Quote from: MichaelW on December 15, 2012, 03:52:41 PM
For that 1202 byte data block Winzip can compress it to 523 bytes (in a 641-byte archive), and RtlCompressBuffer can compress it to 666 bytes

Probably the best alternative to reinventing the wheel.

RtlCompressBuffer: Microsoft started to document it, as well as many other API, with the new SDK for Seven.

There is much better compression available, like FreeArc used in MasmBasic's ZipFiles, but the authors rarely offer (de)compression of a buffer.

frktons

Quote from: Magnum on December 15, 2012, 04:09:37 PM
I have some code that has the ability to zip/unzip directly in memory without any intermediate files.

Is that something that would be of help to you ?

Andy

Yes Andy, it is this kind of routine that I need to implement.
Thanks
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

Quote from: jj2007 on December 15, 2012, 07:23:38 PM
RtlCompressBuffer
Probably the best alternative to reinventing the wheel.

There is much better compression available, like FreeArc used in MasmBasic's ZipFiles, but the authors rarely offer (de)compression of a buffer.

If I don't reinvent the wheel it is hard for me to learn some Assembly. :lol:
Good sort/zip/unzip algos are quite difficult to write, and a good dose of math
is usually necessary. Unfortunately I missed my dose.  ::)

I know, however, that you implemented a good one in MasmBasic.
It would be nice to see it in action with the string in the pgm that
I posted in this 3D.    :t
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

FORTRANS

Hi,

   Some data compression ratio data for text from "The
Data Compression Book", Mark Nelson, 1991, shows
Huffman about 40, Arithmetic about 40 to 70, LZSS about
55 to 60, and LZW about 50 to 60.  Higher numbers are
better ratios.  I did LZW for a GIF creation program and
part of a Huffman compressor to check the logic.  A
quick glance at my code shows the LZW at maybe three
times the size of the Huffman.  Hey guys, how would you
relate complexity to code size?

   MichaelW, my edition has 527 pages and 12 chapters, and
two appendices.  How does that compare to your copy?  He
did not have a chapter on Run Length Encoding.

   Dave, LZW for GIF files allow for 4,096 tokens, give or
take, some are predefined.  Which is 12 bits maximum.
The book above did up to 15 bit LZW.

   Frank, your text is short enough that straight LZW is
probably a poor match.  The more the token table is full
the better it compresses.  If have a lot of short pieces of
text, there are some "tricks" you can try to improve things.
Huffman is somewhat less size dependent for performance
(bigger is still better), but requires the decode table in the
compressed data hurting smaller files.

Regards,

Steve N.

frktons

Quote from: FORTRANS on December 16, 2012, 02:53:23 AM
   Frank, your text is short enough that straight LZW is
probably a poor match. 

Steve N.

I suppose you are right, Steve  :t

I think I'll use RtlCompressBuffer just to have some
algo to confront with, and afterwhile I'm going to try some
homemade code.

The task in itself is almost impossible to achieve with this
kind of text, but I'll give it a shot nevertheless.  :dazzled:
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

Magnum

Too big to post code, so there is an attachment.

It's in C++ but should be easy to convert.

Andy
Take care,
                   Andy

Ubuntu-mate-18.04-desktop-amd64

http://www.goodnewsnetwork.org

FORTRANS

Hi,

   The tricks I mentioned involve compressing all ~30 pieces
of text as one piece to get enough data to compress well.
Then use some method to break out one of the 30 when
decompressing.  A couple of methods come to mind.
Neither seems easy right now.  And of course add code,
making it larger.

Regards,

Steve

jj2007

Quote from: frktons on December 16, 2012, 02:40:03 AM
I know, however, that you implemented a good one in MasmBasic.
It would be nice to see it in action with the string in the pgm that
I posted in this 3D.    :t

That's feasible but doesn't make sense with such short strings. Here is a real example:

include \masm32\MasmBasic\MasmBasic.inc        ; download
        Init
        Let esi=FileRead$("\Masm32\include\Windows.inc")        ; get a string - could be from .data, too
        FileWrite "ZipTest.tmp", esi        ; write it to a file
        GetFiles "ZipTest.tmp"        ; load the Files$() array
        ZipFiles "ZipTest"        ; zip all files (it's exactly one, of course)
        Inkey "Check 'ZipTest.arc'"
        Exit
end start

Output:
FreeArc 0.666 creating archive: ZipTest.arc
Compressed 1 file, 977,412 => 145,460 bytes. Ratio 14.8%
Compression time: cpu 1.23 secs, real 1.33 secs. Speed 736 kB/s
All OK
Check 'ZipTest.arc'


For comparison: WinZip compresses to over 180k; however, the real surprise comes when you compress, say, all 1,000 assembler sources in Masm32 with FreeArc. See here for more info - attention, "Programs ... must finish the compression stage within 12h" is not a joke.

dedndave

i compressed it to 87% of the original size without writing any code   :lol:

original
QuoteMOVHLPS— Move Packed Single-Precision Floating-Point Values High to Low

MOVHLPS xmm1, xmm2

SSE

Description
-----------

This instruction cannot be used for memory to register moves.

128-bit two-argument form:
Moves two packed single-precision floating-point values from the high quadword of the second XMM argument
(second operand) to the low quadword of the first XMM register (first argument). The high quadword of the destination
operand is left unchanged. Bits (VLMAX-1:64) of the corresponding YMM destination register are unmodified.

128-bit three-argument form
Moves two packed single-precision floating-point values from the high quadword of the third XMM argument (third
operand) to the low quadword of the destination (first operand). Copies the high quadword from the second XMM
argument (second operand) to the high quadword of the destination (first operand). Bits (VLMAX-1:128) of the
destination YMM register are zeroed.

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
If VMOVHLPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause
an #UD exception.

compressed
QuoteMOVHLPS— Move Packed Single-Precision Floating-Point Values High to Low

MOVHLPS xmm1,xmm2

SSE

Description:

cannot be used for memory to register moves

128-bit two-arguments:
Move two packed single-precision floating-point values from the high qword of the second XMM
reg operand to the low qword of the first XMM reg operand. The high qword of the destination
operand and Bits (VLMAX-1:64) of the corresponding YMM destination register are left unchanged.

128-bit three-arguments:
Move two packed single-precision floating-point values from the high qword of the third XMM reg operand
to the low qword of the first operand. Copies the high qword from the second XMM reg operand to
the high qword of the first operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
If VMOVHLPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause
an #UD exception.

frktons

Quote from: Magnum on December 16, 2012, 04:46:13 AM
It's in C++ but should be easy to convert.

Andy

Thanks Andy. I don't speak C++ but I'll check it anyway.

Quote from: dedndave on December 16, 2012, 05:21:08 AM
i compressed it to 87% of the original size without writing any code   :lol:


Compliments Dave, you got the real thing :dazzled:

Quote from: jj2007 on December 16, 2012, 05:19:42 AM

That's feasible but doesn't make sense with such short strings. Here is a real example:

include \masm32\MasmBasic\MasmBasic.inc        ; download
        Init
        Let esi=FileRead$("\Masm32\include\Windows.inc")        ; get a string - could be from .data, too
        FileWrite "ZipTest.tmp", esi        ; write it to a file
        GetFiles "ZipTest.tmp"        ; load the Files$() array
        ZipFiles "ZipTest"        ; zip all files (it's exactly one, of course)
        Inkey "Check 'ZipTest.arc'"
        Exit
end start

Output:
FreeArc 0.666 creating archive: ZipTest.arc
Compressed 1 file, 977,412 => 145,460 bytes. Ratio 14.8%
Compression time: cpu 1.23 secs, real 1.33 secs. Speed 736 kB/s
All OK
Check 'ZipTest.arc'


For comparison: WinZip compresses to over 180k; however, the real surprise comes when you compress, say, all 1,000 assembler sources in Masm32 with FreeArc. See here for more info - attention, "Programs ... must finish the compression stage within 12h" is not a joke.

Thanks Jochen. Your implementation only write and read from files?
Usually all the algos use a temporary buffer. Isn't it available to your code?
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

Quote from: FORTRANS on December 16, 2012, 04:52:19 AM
Hi,

   The tricks I mentioned involve compressing all ~30 pieces
of text as one piece to get enough data to compress well.
Then use some method to break out one of the 30 when
decompressing.  A couple of methods come to mind.
Neither seems easy right now.  And of course add code,
making it larger.

Regards,

Steve

Thanks Steve. Probably it is out of my league to do something like that
in Assembly. By the way, considering the program could have dozens of
strings, it'd be useful to see how to put together the strings and to divide
them after the compression. ::)
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

FORTRANS

Quote from: frktons on December 16, 2012, 06:15:12 AM
By the way, considering the program could have dozens of
strings, it'd be useful to see how to put together the strings and to divide
them after the compression. ::)

Hi,

  One way to do it with plain text is to use control codes
to separate the strings.  Use a zero to make it easy to
print, for instance.  Then make a routine to decode the
strings, count them, and print a specific string.

HTH,

Steve N.

jj2007

Quote from: frktons on December 16, 2012, 06:11:17 AM
Thanks Jochen. Your implementation only write and read from files?
Usually all the algos use a temporary buffer. Isn't it available to your code?

Quote from: jj2007 on December 15, 2012, 07:23:38 PMThere is much better compression available, like FreeArc used in MasmBasic's ZipFiles, but the authors rarely offer (de)compression of a buffer.