The MASM Forum

General => The Campus => Topic started by: frktons on December 15, 2012, 10:58:49 AM

Title: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 10:58:49 AM
Something that puzzles me is the possibility to compress/uncompress data
inside a program, to have, for example, 30 big strings [1000-2000 bytes each]
compressed inside the program in order to use them (when needed) during
the program after decompressing them one at a time, or more at a time.

I tried (a couple of years ago) to use code instead of data, and it worked because
of the nature of data to compress:
http://www.masmforum.com/board/index.php?topic=14734.0 (http://www.masmforum.com/board/index.php?topic=14734.0)

But a general compressing/decompressing routine for text strings is out of my league.

I don't even know where to start.

I'd like to know your opinion and experiences you may have about the argument.

Frank 
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 11:11:34 AM
you can read up on LZW compression
that is what GIF and ZIP files use

there are other types of compression, as well
simple packing can reduce some files
they remove repetitious bytes with table entries and expand them from the table to decompress

another method is mathimatical
they find an area that contains fewer than "some number" of used values and compress mathimatically
for example, if you have an area of a file that has no more than 16 unique byte values,
that area may be compressed by a factor of 2
the same thing applies for other powers of 2 - that is just a simple one to envision
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 11:33:53 AM
I'd like to compress simple ASCII text strings like:

.data

string1 db "blabalblabalblabalblabalblabalbalbahnbagbgbgabgbagbagbag",
               "blabalblabalblabalblabalblabalbalbahnbagbgbgabgbagbagbag",
               "blabalblabalblabalblabalblabalbalbahnbagbgbgabgbagbagbag" ....

string2 db "blabalblabalblabalblabalblabalbalbahnbagbgbgabgbagbagbag",
               "blabalblabalblabalblabalblabalbalbahnbagbgbgabgbagbagbag",
               "blabalblabalblabalblabalblabalbalbahnbagbgbgabgbagbagbag" ....
......

that make the program 30-100k bigger because of them, in something much
smaller.

According to your exerience and/or knowledge, what kind of shrinking proc
should I consider fit?

The documentation online is quite unreadable [too math formulas] for me.
I need something that is more descriptive, and simpler at the same time,
I really don't need complex mathematical algos.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: Gunner on December 15, 2012, 11:56:03 AM
It could be as simple as doing something like this:
0605blabalbalbahnbagbgbgabg0303bag
for this sting:
blabalblabalblabalblabalblabalbalbahnbagbgbgabgbagbagbag

Then you could expand the string eaisly:
0605blabalbalbahnbagbgbgabg0303bag

0605= the next 6 bytes are repeated 5 times, 0303=the next 3 bytes are repeated 3 times.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 12:17:39 PM
Quote from: Gunner on December 15, 2012, 11:56:03 AM
It could be as simple as doing something like this:
0605blabalbalbahnbagbgbgabg0303bag
for this sting:
blabalblabalblabalblabalblabalbalbahnbagbgbgabgbagbagbag

Then you could expand the string eaisly:
0605blabalbalbahnbagbgbgabg0303bag

0605= the next 6 bytes are repeated 5 times, 0303=the next 3 bytes are repeated 3 times.

Yes Gunner, this is one of the things I've meditated upon. Of course you need to indicate what
has to be duplicated in order to having it actually do something ::)

We could code it like:
Quote
blabal blabal blabal blabal blabal balbahnbagbgbgabgbagbagbag =
0105H ---> indicate that the string number 1 in a strings array ["blabal"]
has to be duplicated five times.

Probably you intended something like that, I suppose.

My apologies, you wrote it in a line of code I didn't see.  :redface:


Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 12:35:16 PM
let me seeeeee.....

i did something simple, long ago, to compress a text file inside an exe
the math case i used would allow up to 85 unique characters, which were in a translation table
each packed string had 5 bytes and unpacked into 6 characters
it used base 85 math   :P
not great compression, but it saved a few kb on a 21 kb text file and it unpacked quickly

char 1 x 1
+char 2 x 85
+char 3 x 85^2
+char 4 x 85^3
+char 5 x 85^4
+char 6 x 85^5

that adds up to a binary value that will fit into 5 bytes
but - the char was not added directly
the offset of the char in the translation table was always 0 to 84 and that was used

i used that rather than LZW because the code was very simple
in that particular case - if the code was big, nothing was gained - lol
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 12:38:08 PM
Probably a real example could help.
If I have the following text in a string, and I want to save
some bytes [half or more of its lenght], how could I do it,
without resorting to complex algos around?
Quote
MOVHLPS— Move Packed Single-Precision Floating-Point Values High to Low

MOVHLPS xmm1, xmm2

SSE

Description
-----------

This instruction cannot be used for memory to register moves.

128-bit two-argument form:
Moves two packed single-precision floating-point values from the high quadword of the second XMM argument
(second operand) to the low quadword of the first XMM register (first argument). The high quadword of the destination
operand is left unchanged. Bits (VLMAX-1:64) of the corresponding YMM destination register are unmodified.

128-bit three-argument form
Moves two packed single-precision floating-point values from the high quadword of the third XMM argument (third
operand) to the low quadword of the destination (first operand). Copies the high quadword from the second XMM
argument (second operand) to the high quadword of the destination (first operand). Bits (VLMAX-1:128) of the
destination YMM register are zeroed.

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
If VMOVHLPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause
an #UD exception.

The text file for this string is about 1,200 bytes. What about reducing it to half?
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 12:43:47 PM
half is very ambitious
you might be doing well at 75 to 85 %
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 12:53:12 PM
Quote from: dedndave on December 15, 2012, 12:43:47 PM
half is very ambitious
you might be doing well at 75 to 85 %

Did I ever miss a goal?  :lol:
We start from 50% shrinking, and after we'll see.
For the time being I prepare the job.
Here it is the result we have to reach:
http://masm32.com/board/index.php?action=dlattach;topic=1091.0;attach=920 (http://masm32.com/board/index.php?action=dlattach;topic=1091.0;attach=920)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 12:56:05 PM
you might get 50%
it depends on the data
let me look at it....
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 12:57:14 PM
you wanna give me that as a text file ?   :biggrin:
otherwise, i hafta write a little program to convert it to plain text
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 01:12:10 PM
i got it...
:lol:

QuoteMOVHLPS— Move Packed Single-Precision Floating-Point Values High to Low

MOVHLPS xmm1, xmm2

SSE

Description
-----------

This instruction cannot be used for memory to register moves.

128-bit two-argument form:
Moves two packed single-precision floating-point values from the high quadword of the second XMM argument
(second operand) to the low quadword of the first XMM register (first argument). The high quadword of the destination
operand is left unchanged. Bits (VLMAX-1:64) of the corresponding YMM destination register are unmodified.

128-bit three-argument form
Moves two packed single-precision floating-point values from the high quadword of the third XMM argument (third
operand) to the low quadword of the destination (first operand). Copies the high quadword from the second XMM
argument (second operand) to the high quadword of the destination (first operand). Bits (VLMAX-1:128) of the
destination YMM register are zeroed.

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
If VMOVHLPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause
an #UD exception.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 01:14:03 PM
i'd be surprised if you can get that down to 50%

actually - a ZIP file of it is 53%
it looks like LZW is a good choice, as that goes
however, the LZW decompression code is likely to be a few hundred bytes   :P
that's if you use your own code
now - there is probably a way to use the OS unzipper to decompress it for you
never tried that - but i am sure someone in here has
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 01:17:02 PM
Quote from: dedndave on December 15, 2012, 01:14:03 PM
i'd be surprised if you can get that down to 50%

It'll take some time, but we'll do it, as usual  :lol:
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 01:18:53 PM
so - zip it
put it in the EXE as a raw binary resource
use a routine like the one i posted last week to get data from a resource
figure out how to use the OS's unzipper
and you are there
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 01:20:51 PM
http://masm32.com/board/index.php?topic=967.msg8697#msg8697 (http://masm32.com/board/index.php?topic=967.msg8697#msg8697)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 01:27:00 PM
RtlDecompressBuffer
http://msdn.microsoft.com/en-us/library/windows/hardware/ff552191%28v=vs.85%29.aspx (http://msdn.microsoft.com/en-us/library/windows/hardware/ff552191%28v=vs.85%29.aspx)

RtlCompressBuffer
http://msdn.microsoft.com/en-us/library/windows/hardware/ff552127%28v=vs.85%29.aspx (http://msdn.microsoft.com/en-us/library/windows/hardware/ff552127%28v=vs.85%29.aspx)


ok
use RtlCompressBuffer to create a raw binary
add it to your program as a resource
use the LoadFromRsrc routine i posted above to get it into a buffer
use RtlDecompressBuffer to decompress it

BANG - 50%
which is, you guessed it, $50

of course, you could just create a DB list like you have of the compressed binary
then, you don't need my routine - just put it in the .DATA section
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 01:41:39 PM
you will want to use
        INCLUDE    \masm32\include\masm32rt.inc
        INCLUDE    \masm32\include\ntoskrnl.inc
        INCLUDELIB \masm32\lib\ntoskrnl.lib
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 01:49:12 PM
Yes Dave, I've thought about these systems as well.

But where is the fun in using precooked meals?

Here we have something that will help in the process of
building our own code.  :lol:
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 01:50:35 PM
ok
write your own code
hopefully, it is smaller that the 606 bytes you are trying to save   :biggrin:
that's the advantage of using the API function
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 01:54:20 PM
Quote from: dedndave on December 15, 2012, 01:50:35 PM
ok
write your own code
hopefully, it is smaller that the 606 bytes you are trying to save   :biggrin:
that's the advantage of using the API function

The point is, I'm thinking about multiple long strings, so the code will
earn its $50% many times inside the program.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: nidud on December 15, 2012, 01:56:36 PM
deleted
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 02:01:37 PM
Hi nidud.

Will you post also the algo to obtain:

bcode db 0,6,5*6 ; 'blabalblabalblabalblabalblabal'
db -3,3,3 ; 'bal'
db -3,2,2 ; 'ba'
db 6,2,2 ; 'hn'
db -4,2,2 ; 'ba'
db 8,3,3 ; 'gbg'
db -2,2,2 ; 'bg'
db 11,3,3 ; 'abg'
db -10,3,3*3 ; 'bagbagbag'
db -56,56,2*56 ; repeat line * 2
db -1 ; 45 byte


I miss the compression phase.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 02:03:58 PM
LZW data compression is not rocket science
basically, you break up the data and replace sections with tokens
in the token table, you put the original "string"
i think the token table can only hold something like ~254 tokens
when it gets full, and you need to create a new token, you trash it and start a new token table
the tokens then replace the strings in the compressed data stream

so - you can write your own if you like
there are also pre-written libraries like gzip, etc
i seem to recall someone making a LIB and INC for gzip a while back
don't remember if it was in the new forum or the old one
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 02:09:43 PM
Quote from: dedndave on December 15, 2012, 02:03:58 PM
LZW data compression is not rocket science
so - you can write your own if you like
there are also pre-written libraries like gzip, etc
i seem to recall someone making a LIB and INC for gzip a while back
don't remember if it was the new forum or the old one

Probably it was on the old forum. By the way, as you say I can write a simplified
algo that suits my needs. I've thought about it during the last 2 years, when I
remembered the matter, not very often, but I've got some ideas to try.
And I think writing some code shouldn't hurt in the process of learning a bit of
Assembly :lol:
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 02:12:01 PM
i updated my previous post
LZW compression...

Quotebasically, you break up the data and replace sections with tokens
in the token table, you put the original "string"
i think the token table can only hold something like ~254 tokens
when it gets full, and you need to create a new token, you trash it and start a new token table
the tokens then replace the strings in the compressed data stream
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 02:19:16 PM
I think the algo was translated by Jochen:
http://www.masmforum.com/board/index.php?topic=15470.0 (http://www.masmforum.com/board/index.php?topic=15470.0)

Quote from: dedndave on December 15, 2012, 02:12:01 PM
LZW compression...

basically, you break up the data and replace sections with tokens
in the token table, you put the original "string"
i think the token table can only hold something like ~254 tokens
when it gets full, and you need to create a new token, you trash it and start a new token table
the tokens then replace the strings in the compressed data stream

Yes Dave, this is one of the few things I understand about compression methods.
And I think it'll be enough for the time being.   :t
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: nidud on December 15, 2012, 02:19:38 PM
deleted
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 15, 2012, 02:32:13 PM
yes - the trick is to pick good strings to tokenize
token selection should match the type of data
that's the key to getting good compression ratios

plain text like this is probably the easiest data type to work with (except for a shit-load of zeros)
it's somewhat predictable   :P
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 15, 2012, 02:35:02 PM
Quote from: nidud on December 15, 2012, 02:19:38 PM

:biggrin:

There are many algos to do that
You scan the output buffer for duplicate strings

The most common is to use { WORD offset, BYTE length }
The minimum string length is then 3 byte

The string buffer is also converted to bits:
a = 0
b = 10
l = 1100
h = 1101
n = 1110
g = 1111

bla = 10 1100 0 = 7 bits


OK nidud, if you feel to do it, show me your idea inside the code
I posted, and tell me what % you get shrinking the string.

Quote from: dedndave on December 15, 2012, 02:32:13 PM
yes - the trick is to pick good strings to tokenize
token selection should match the type of data
that's the key to getting good compression ratios

That's also the most difficult thing to do in my opinion.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: MichaelW on December 15, 2012, 03:52:41 PM
For that 1202 byte data block Winzip can compress it to 523 bytes (in a 641-byte archive), and RtlCompressBuffer can compress it to 666 bytes at standard compression and 659 bytes at maximum compression. I think run-length encoding will not compress it enough to be worth the effort.

In the Data Compression Handbook, Second Edition, Mark Nelson and Jean-Loup Gailly, 1996, M&T books, ISBN 1-55851-434-1, order 3 arithmetic coding produced the highest compression ratio for text files, as well as the highest overall ratio across text, graphics, and executable files, tested against Huffman, Adaptive Huffman, and several variants each of LZSS and LZW.

Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: Magnum on December 15, 2012, 04:09:37 PM
I have some code that has the ability to zip/unzip directly in memory without any intermediate files.

Is that something that would be of help to you ?

Andy
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 15, 2012, 07:23:38 PM
Quote from: MichaelW on December 15, 2012, 03:52:41 PM
For that 1202 byte data block Winzip can compress it to 523 bytes (in a 641-byte archive), and RtlCompressBuffer can compress it to 666 bytes

Probably the best alternative to reinventing the wheel.

RtlCompressBuffer: Microsoft started to document it, as well as many other API, with the new SDK for Seven. (http://www.jose.it-berater.org/smfforum/index.php?topic=3467.0)

There is much better compression available, like FreeArc used in MasmBasic's ZipFiles, but the authors rarely offer (de)compression of a buffer.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 02:28:43 AM
Quote from: Magnum on December 15, 2012, 04:09:37 PM
I have some code that has the ability to zip/unzip directly in memory without any intermediate files.

Is that something that would be of help to you ?

Andy

Yes Andy, it is this kind of routine that I need to implement.
Thanks
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 02:40:03 AM
Quote from: jj2007 on December 15, 2012, 07:23:38 PM
RtlCompressBuffer
Probably the best alternative to reinventing the wheel.

There is much better compression available, like FreeArc used in MasmBasic's ZipFiles, but the authors rarely offer (de)compression of a buffer.

If I don't reinvent the wheel it is hard for me to learn some Assembly. :lol:
Good sort/zip/unzip algos are quite difficult to write, and a good dose of math
is usually necessary. Unfortunately I missed my dose.  ::)

I know, however, that you implemented a good one in MasmBasic.
It would be nice to see it in action with the string in the pgm that
I posted in this 3D.    :t
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: FORTRANS on December 16, 2012, 02:53:23 AM
Hi,

   Some data compression ratio data for text from "The
Data Compression Book", Mark Nelson, 1991, shows
Huffman about 40, Arithmetic about 40 to 70, LZSS about
55 to 60, and LZW about 50 to 60.  Higher numbers are
better ratios.  I did LZW for a GIF creation program and
part of a Huffman compressor to check the logic.  A
quick glance at my code shows the LZW at maybe three
times the size of the Huffman.  Hey guys, how would you
relate complexity to code size?

   MichaelW, my edition has 527 pages and 12 chapters, and
two appendices.  How does that compare to your copy?  He
did not have a chapter on Run Length Encoding.

   Dave, LZW for GIF files allow for 4,096 tokens, give or
take, some are predefined.  Which is 12 bits maximum.
The book above did up to 15 bit LZW.

   Frank, your text is short enough that straight LZW is
probably a poor match.  The more the token table is full
the better it compresses.  If have a lot of short pieces of
text, there are some "tricks" you can try to improve things.
Huffman is somewhat less size dependent for performance
(bigger is still better), but requires the decode table in the
compressed data hurting smaller files.

Regards,

Steve N.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 03:51:22 AM
Quote from: FORTRANS on December 16, 2012, 02:53:23 AM
   Frank, your text is short enough that straight LZW is
probably a poor match. 

Steve N.

I suppose you are right, Steve  :t

I think I'll use RtlCompressBuffer just to have some
algo to confront with, and afterwhile I'm going to try some
homemade code.

The task in itself is almost impossible to achieve with this
kind of text, but I'll give it a shot nevertheless.  :dazzled:
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: Magnum on December 16, 2012, 04:46:13 AM
Too big to post code, so there is an attachment.

It's in C++ but should be easy to convert.

Andy
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: FORTRANS on December 16, 2012, 04:52:19 AM
Hi,

   The tricks I mentioned involve compressing all ~30 pieces
of text as one piece to get enough data to compress well.
Then use some method to break out one of the 30 when
decompressing.  A couple of methods come to mind.
Neither seems easy right now.  And of course add code,
making it larger.

Regards,

Steve
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 16, 2012, 05:19:42 AM
Quote from: frktons on December 16, 2012, 02:40:03 AM
I know, however, that you implemented a good one in MasmBasic.
It would be nice to see it in action with the string in the pgm that
I posted in this 3D.    :t

That's feasible but doesn't make sense with such short strings. Here is a real example:

include \masm32\MasmBasic\MasmBasic.inc        ; download (http://masm32.com/board/index.php?topic=94.0)
        Init
        Let esi=FileRead$("\Masm32\include\Windows.inc")        ; get a string - could be from .data, too
        FileWrite "ZipTest.tmp", esi        ; write it to a file
        GetFiles "ZipTest.tmp"        ; load the Files$() array
        ZipFiles "ZipTest"        ; zip all files (it's exactly one, of course)
        Inkey "Check 'ZipTest.arc'"
        Exit
end start

Output:
FreeArc 0.666 creating archive: ZipTest.arc
Compressed 1 file, 977,412 => 145,460 bytes. Ratio 14.8%
Compression time: cpu 1.23 secs, real 1.33 secs. Speed 736 kB/s
All OK
Check 'ZipTest.arc'


For comparison: WinZip compresses to over 180k; however, the real surprise comes when you compress, say, all 1,000 assembler sources in Masm32 with FreeArc. See here for more info (http://www.maximumcompression.com/data/summary_mf2.php) - attention, "Programs ... must finish the compression stage within 12h" is not a joke.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 16, 2012, 05:21:08 AM
i compressed it to 87% of the original size without writing any code   :lol:

original
QuoteMOVHLPS— Move Packed Single-Precision Floating-Point Values High to Low

MOVHLPS xmm1, xmm2

SSE

Description
-----------

This instruction cannot be used for memory to register moves.

128-bit two-argument form:
Moves two packed single-precision floating-point values from the high quadword of the second XMM argument
(second operand) to the low quadword of the first XMM register (first argument). The high quadword of the destination
operand is left unchanged. Bits (VLMAX-1:64) of the corresponding YMM destination register are unmodified.

128-bit three-argument form
Moves two packed single-precision floating-point values from the high quadword of the third XMM argument (third
operand) to the low quadword of the destination (first operand). Copies the high quadword from the second XMM
argument (second operand) to the high quadword of the destination (first operand). Bits (VLMAX-1:128) of the
destination YMM register are zeroed.

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
If VMOVHLPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause
an #UD exception.

compressed
QuoteMOVHLPS— Move Packed Single-Precision Floating-Point Values High to Low

MOVHLPS xmm1,xmm2

SSE

Description:

cannot be used for memory to register moves

128-bit two-arguments:
Move two packed single-precision floating-point values from the high qword of the second XMM
reg operand to the low qword of the first XMM reg operand. The high qword of the destination
operand and Bits (VLMAX-1:64) of the corresponding YMM destination register are left unchanged.

128-bit three-arguments:
Move two packed single-precision floating-point values from the high qword of the third XMM reg operand
to the low qword of the first operand. Copies the high qword from the second XMM reg operand to
the high qword of the first operand. Bits (VLMAX-1:128) of the destination YMM register are zeroed.

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
If VMOVHLPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause
an #UD exception.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 06:11:17 AM
Quote from: Magnum on December 16, 2012, 04:46:13 AM
It's in C++ but should be easy to convert.

Andy

Thanks Andy. I don't speak C++ but I'll check it anyway.

Quote from: dedndave on December 16, 2012, 05:21:08 AM
i compressed it to 87% of the original size without writing any code   :lol:


Compliments Dave, you got the real thing :dazzled:

Quote from: jj2007 on December 16, 2012, 05:19:42 AM

That's feasible but doesn't make sense with such short strings. Here is a real example:

include \masm32\MasmBasic\MasmBasic.inc        ; download (http://masm32.com/board/index.php?topic=94.0)
        Init
        Let esi=FileRead$("\Masm32\include\Windows.inc")        ; get a string - could be from .data, too
        FileWrite "ZipTest.tmp", esi        ; write it to a file
        GetFiles "ZipTest.tmp"        ; load the Files$() array
        ZipFiles "ZipTest"        ; zip all files (it's exactly one, of course)
        Inkey "Check 'ZipTest.arc'"
        Exit
end start

Output:
FreeArc 0.666 creating archive: ZipTest.arc
Compressed 1 file, 977,412 => 145,460 bytes. Ratio 14.8%
Compression time: cpu 1.23 secs, real 1.33 secs. Speed 736 kB/s
All OK
Check 'ZipTest.arc'


For comparison: WinZip compresses to over 180k; however, the real surprise comes when you compress, say, all 1,000 assembler sources in Masm32 with FreeArc. See here for more info (http://www.maximumcompression.com/data/summary_mf2.php) - attention, "Programs ... must finish the compression stage within 12h" is not a joke.

Thanks Jochen. Your implementation only write and read from files?
Usually all the algos use a temporary buffer. Isn't it available to your code?
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 06:15:12 AM
Quote from: FORTRANS on December 16, 2012, 04:52:19 AM
Hi,

   The tricks I mentioned involve compressing all ~30 pieces
of text as one piece to get enough data to compress well.
Then use some method to break out one of the 30 when
decompressing.  A couple of methods come to mind.
Neither seems easy right now.  And of course add code,
making it larger.

Regards,

Steve

Thanks Steve. Probably it is out of my league to do something like that
in Assembly. By the way, considering the program could have dozens of
strings, it'd be useful to see how to put together the strings and to divide
them after the compression. ::)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: FORTRANS on December 16, 2012, 08:23:38 AM
Quote from: frktons on December 16, 2012, 06:15:12 AM
By the way, considering the program could have dozens of
strings, it'd be useful to see how to put together the strings and to divide
them after the compression. ::)

Hi,

  One way to do it with plain text is to use control codes
to separate the strings.  Use a zero to make it easy to
print, for instance.  Then make a routine to decode the
strings, count them, and print a specific string.

HTH,

Steve N.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 16, 2012, 09:48:28 AM
Quote from: frktons on December 16, 2012, 06:11:17 AM
Thanks Jochen. Your implementation only write and read from files?
Usually all the algos use a temporary buffer. Isn't it available to your code?

Quote from: jj2007 on December 15, 2012, 07:23:38 PMThere is much better compression available, like FreeArc used in MasmBasic's ZipFiles, but the authors rarely offer (de)compression of a buffer.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 16, 2012, 10:07:46 AM
something i tried long ago was a dictionary of words
each word then had an index number
you create the text by using a series of index numbers
i was surprised at how fast it was   :P
you can use the text for some words more than once
for example, the string "aren't" can be used for "a", "are", "aren't"
you can also use the "n't" to make other contractions like "doesn't", "don't", "isn't", etc
of course, if the index is longer than the string, the advantage dwindles a bit
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 16, 2012, 10:50:41 AM
I hacked together an example with my favourite test string, i.e. the contents of \Masm32\include\Windows.inc, using RtlCompressBuffer

The good news: It works.
The bad news: Its performance is hilarious, both in terms of compression time and compression ratio (you read this in Redmond, don't you?  :greensml:).

RtlGetCompressionWorkSpaceSize OK
edx             4096
$Err$()         Operation completed

RtlCompressBuffer OK    $Err$()         Operation completed

Bytes written vs original size
CompSize        285782   ; << for comparison: WinZip 193990, FreeArc 146200 bytes
LastFileSize    977412

FinalUncompressedSize
ecx             977412
eax             0
$Err$()         Operation completed

include \masm32\MasmBasic\MasmBasic.inc        ; download (http://masm32.com/board/index.php?topic=94.0)
uselib ntdll                ; MSDN (http://msdn.microsoft.com/en-us/library/windows/hardware/ff552127%28v=vs.85%29.aspx)

.data?
WorkSpace        db 4096 dup(?)        ;
CompSize        dd ?

  Init
  Let esi=FileRead$("\Masm32\include\Windows.inc")        ; sets LastFileSize
  Let edi=New$(LastFileSize)
  Let ebx=New$(LastFileSize)
  sub esp, 8                ; create two dword slots on stack
;  COMPRESSION_FORMAT_LZNT1=2        defined by Masm32, OK
  COMPRESSION_FORMAT_XPRESS=3        ; no luck under XP SP3
  COMPRESSION_FORMAT_XPRESS_HUFF=4        ; no luck under XP SP3
  ; STANDARD is much faster but compression ration is even more ridiculous
  FormatEngine=COMPRESSION_FORMAT_LZNT1 or COMPRESSION_ENGINE_MAXIMUM
  invoke RtlGetCompressionWorkSpaceSize, FormatEngine,
  esp, esp
  pop edx                        ; CompressBufferWorkSpaceSize, 4096
  pop ecx                        ; CompressFragmentWorkSpaceSize, 0
  .if eax!=STATUS_SUCCESS
        push eax
        Print "RtlGetCompressionWorkSpaceSize: ", Err$()
        pop ecx
        SWITCH ecx
        CASE STATUS_NOT_SUPPORTED
                PrintLine "STATUS_NOT_SUPPORTED"
        CASE STATUS_INVALID_PARAMETER
                PrintLine "STATUS_INVALID_PARAMETER"
        CASE STATUS_UNSUPPORTED_COMPRESSION
                PrintLine "STATUS_UNSUPPORTED_COMPRESSION"
        DEFAULT
                PrintLine "STATUS_UNKNOWN"
        ENDSW
  .else
        deb 4, "RtlGetCompressionWorkSpaceSize OK", edx, $Err$()
        invoke RtlCompressBuffer, FormatEngine,
          esi, LastFileSize,
          edi, LastFileSize,
          4096, addr CompSize, addr WorkSpace
        .if eax!=S_OK
                deb 4, "RtlCompressBuffer FAILED", $Err$()
        .else
                deb 4, "RtlCompressBuffer OK", $Err$()
                Open "O", #1, "compressed.zip"
                PrintBuffer #1, edi, CompSize
                Close #1
                deb 4, "Bytes written vs original size", CompSize, LastFileSize

                push eax
                invoke SetLastError, 0
                invoke RtlDecompressBuffer, COMPRESSION_FORMAT_LZNT1,        ; MSDN (http://msdn.microsoft.com/en-us/library/windows/hardware/ff552191%28v=vs.85%29.aspx): This parameter must be set to COMPRESSION_FORMAT_LZNT1
                  ebx, LastFileSize,
                  edi, LastFileSize,
                  esp
                pop ecx        ; FinalUncompressedSize
                deb 4, "FinalUncompressedSize", ecx, eax, $Err$()
                Open "O", #1, "decompressed.inc"
                PrintBuffer #1, ebx, ecx
                Close #1
        .endif
  .endif
  Inkey "ok?"
  Exit
end start
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 12:01:04 PM
Quote from: jj2007 on December 16, 2012, 09:48:28 AM

There is much better compression available, like FreeArc used in MasmBasic's ZipFiles, but the authors rarely offer (de)compression of a buffer.

I already read it Jochen and remember it as well, but I wonder what
prevents you from intercepting the buffer from which you write to the file,
I mean the routine you use. Of course I don't know how much code
you had to write to use FreeArc, but I imagine that if you can write to
an output file, it has to be from a buffer.
FreeArc appears to be open-source, so if you didn't implement the buffer
version is because you don't need it, or [as the author of MasmBasic] you
don't like to offer the decompression of a buffer?

P.S. spero non ti senta sotto interrogatorio, è solo curiosità informatica.  :P

Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 12:13:48 PM
Quote from: jj2007 on December 16, 2012, 10:50:41 AM
I hacked together an example with my favourite test string, i.e. the contents of \Masm32\include\Windows.inc, using RtlCompressBuffer

The good news: It works.
The bad news: Its performance is hilarious, both in terms of compression time and compression ratio (you read this in Redmond, don't you?  :greensml:).

The most hilarious stuff would be not being able to outperform it
after months of reinventing the wheel :lol:
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 12:16:16 PM
Quote from: dedndave on December 16, 2012, 10:07:46 AM
something i tried long ago was a dictionary of words
each word then had an index number
you create the text by using a series of index numbers
i was surprised at how fast it was   :P
you can use the text for some words more than once
for example, the string "aren't" can be used for "a", "are", "aren't"
you can also use the "n't" to make other contractions like "doesn't", "don't", "isn't", etc
of course, if the index is longer than the string, the advantage dwindles a bit

This is one of the idea I have in mind. But going from the ideas to a
working code is a completely new world. :lol:

I tried the FreeArc GUI version 0.666 [the last one so far] and you know what?
It compresses the text file 1202 bytes with a worse compression ratio than the
Windows zipper. So I understand why it is better in this particular case to use
the Windows RtlCompress than any other better algo around. For bigger strings it is
different. With my future algo? Some more months and I'll know.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 16, 2012, 12:53:33 PM
i would think the call to RtlGetCompressionWorkSpaceSize is unnecessary and slow
maybe you could just allocate an ample buffer and avoid using that function

although, the compression time is not a big issue, really
you are only going to compress them one time   :P
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: MichaelW on December 16, 2012, 01:10:36 PM
Steve,

I have both of the books. The second is about 550 pages. I have not done any close comparison of the two because I use only the newer one, but they both cover roughly the same material.

Frank,

If you will be compressing the data once and expanding it many times, then I think arithmetic coding would be a good choice. For arithmetic coding the compression rates are low (or very low compared to Huffman and the LZW variants) but the expansion rates for the higher-order models are very high.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 01:16:35 PM
Quote from: dedndave on December 16, 2012, 12:53:33 PM
i would think the call to RtlGetCompressionWorkSpaceSize is unnecessary and slow
maybe you could just allocate an ample buffer and avoid using that function

although, the compression time is not a big issue, really
you are only going to compress them one time   :P

You are right Master. The idea is to have them already compressed
inside the program. But somehow (if I don't use RtlCompress) I have
to invent a customized algo to shrink/unpack the data with my own
routine as well.

Quote from: MichaelW on December 16, 2012, 01:10:36 PM

Frank,

If you will be compressing the data once and expanding it many times, then I think arithmetic coding would be a good choice. For arithmetic coding the compression rates are low (or very low compared to Huffman and the LZW variants) but the expansion rates for the higher-order models are very high.


Thanks Michael, that's something to keep in mind.  :t
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 16, 2012, 01:26:55 PM
the decompress code is fairly easy compared to the compress code   :P
i have some ideas on this, but i have another project going that i want to finish before i pick up a new one
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 01:56:29 PM
Quote from: dedndave on December 16, 2012, 01:26:55 PM
the decompress code is fairly easy compared to the compress code   :P
i have some ideas on this, but i have another project going that i want to finish before i pick up a new one

This project could take months to be accomplished...you'll probably find me
here with a new reinvented wheel under construction when you're back.

My overall idea for compressing data:

Quote
1. Scan the text buffer to shrink and see how many:

    a. ASCII codes are used
    b. 0-9 ASCII codes
    c. A-Z  "         "
    d. a-z  "         "
    e. spaces
    f.  CRLF
    g.  special chars [,.:-_[]()!£$%&/|\?'^*+@#....]
    h.  tokens with n bytes size
    j.   occurrences of duplicated tokens for each size found
    k.  and other things that will be necessary

2.  Estimate how much the text could be shrinked considering that:

    a. all spaces will be removed [a flag will tell the proc when inserting one/more of them]
    b. CRLF can be coded as 1 byte token
    c. repeated tokens can be substituted with the address [1-2 bytes max]
       of the token array/s
    d. repeated chars ["------------"] can be coded with appropriate instructions
    e. Uppercase can be temporarily converted to lowercase
    f. duplicated patterns can be coded conveniently
    g. most frequent ASCII codes can be coded as 3-4 bits to save bytes
    h. and so on

3.  Find a convenient way, both in speed and size, to accomplish the task

4.  Test many different routines and choose what seems to suits best the
     task.

5.  Pay $50 to Dave
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 16, 2012, 02:00:44 PM
Quote from: frktons on December 16, 2012, 12:01:04 PM
what prevents you from intercepting the buffer from which you write to the file

But that's the point: I have to write it to a file, because FreeArc needs a file, not a pointer to a buffer.

Quote from: dedndave on December 16, 2012, 12:53:33 PM
i would think the call to RtlGetCompressionWorkSpaceSize is unnecessary and slow

I timed it, Dave ;-)
RtlGetCompressionWorkSpaceSize took 0 ms
RtlCompressBuffer took 2393 ms
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: Donkey on December 16, 2012, 02:15:47 PM
If I was to write a simple codec for string files I would begin by reducing every ASCII character to 7 bits with 0000000 as a flag character for non standard cases. This is ofcouse lossless as ASCII is only 7 bits anyway and the loss of zero is not significant. For repeat characters any 3 repeated characters or greater can be replaced by 0000000:nRepeats:char where nRepeats would be 6 bits preceded by 0 (allowing up to 63 repeats) and char would be 7 bits. By reserving the most significant bit in nRepeats you can add special cases by turning the bit on and using the least significant 6 bits to specify the operation to be performed.

Note that this only works on text files using ASCII and not on ANSI (Windows 1252) files.

This is just an idea off the top of my head for a simple text compression algorithm that should be pretty darn fast and though it wouldn't approach the compression rates of any modern algorithm it would be quick and easy enough to write.

Edgar
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 02:21:54 PM
Quote from: jj2007 on December 16, 2012, 02:00:44 PM

But that's the point: I have to write it to a file, because FreeArc needs a file, not a pointer to a buffer.


Do you speak C/C++? Most probably yes. You have the source.
In my experience it only means that you don't need or don't like
to modify the 2-3[hundred] instructions needed for implementing it.

As they say: we can. We can write it from scratch, modify it, cheat to it,
bending it to our needs, if we really want to.

Anyway, you have your reasons, and I respect them. You've already done
an impressive amount of work to build MasmBasic, so: "far from me" the
idea you should do some extra work.  ;)

Quote from: Donkey on December 16, 2012, 02:15:47 PM
If I was to write a simple codec for string files I would begin by reducing every ASCII character to 7 bits with 0000000 as a flag character for non standard cases. This is ofcouse lossless as ASCII is only 7 bits anyway and the loss of zero is not significant. For repeat characters any 3 repeated characters or greater can be replaced by 0000000:nRepeats:char where nRepeats would be 6 bits preceded by 0 (allowing up to 63 repeats) and char would be 7 bits. By reserving the most significant bit in nRepeats you can add special cases by turning the bit on and using the least significant 6 bits to specify the operation to be performed.

Note that this only works on text files using ASCII and not on ANSI (Windows 1252) files.

This is just an idea off the top of my head for a simple text compression algorithm that should be pretty darn fast and though it wouldn't approach the compression rates of any modern algorithm it would be quick and easy enough to write.

Edgar

Thanks Edgar, nice idea to meditate upon.  :t
Unfortunately I'm considering files with ANSI code, so I've to use all the
bits to represent some special chars.

It could happen, however, that some strings don't need special chars, so
it is the case in which your idea could be implemented.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 16, 2012, 02:36:06 PM
Quote from: frktons on December 16, 2012, 02:21:54 PM
Quote from: jj2007 on December 16, 2012, 02:00:44 PM
But that's the point: I have to write it to a file, because FreeArc needs a file, not a pointer to a buffer.

Do you speak C/C++? Most probably yes. You have the source.

Yes, the source is available here (http://netcologne.dl.sourceforge.net/project/sevenzip/LZMA%20SDK/lzma920.tar.bz2). If you can convince your C compiler to translate LZMA\C\Util\Lzma\LzmaUtil.c into a DLL or LIB, then I will offer your the CompressBuffer function.

My Visual Studio 2010 refuses to convert any "project" older than two years or so, sometimes with loads of cryptic error messages, sometimes with a long silence. Pelles C utters "internal errors" without specifying what's wrong, so it's the usual C mess. But try your luck.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 16, 2012, 02:42:41 PM
Quote from: Donkey on December 16, 2012, 02:15:47 PMFor repeat characters any 3 repeated characters or greater can be replaced by 0000000:nRepeats:char where nRepeats would be 6 bits preceded by 0 (allowing up to 63 repeats) and char would be 7 bits.

Go a step further and set the rule "everything above 127 is an index into the dictionary of frequently used words"

So the three-byte string chr$(3+128, 32, 7+128) becomes <invoke lstrcpy>, 14 bytes

1 mov
2 add
3 invoke
4 proc
5 endp
6 call
7 lstrcpy

This is more or less the approach of archivers nowadays - find long frequently used byte sequences.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: Donkey on December 16, 2012, 02:49:10 PM
For that you could set the most significant bit of the nRepeats character followed by a 7 bit index in the char section. For example "mov eax,0" would reduce to the following given that an index entry is command #1 and it was the first entry in the index:

0000000:1000001:0000001

All index entries would reduce to 21 bits total length regardless of size. (plus the actual index entry itself and the lookup table)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 16, 2012, 03:05:36 PM
Quote from: jj2007 on December 16, 2012, 02:36:06 PM

Yes, the source is available here (http://netcologne.dl.sourceforge.net/project/sevenzip/LZMA%20SDK/lzma920.tar.bz2). If you can convince your C compiler to translate LZMA\C\Util\Lzma\LzmaUtil.c into a DLL or LIB, then I will offer your the CompressBuffer function.

My Visual Studio 2010 refuses to convert any "project" older than two years or so, sometimes with loads of cryptic error messages, sometimes with a long silence. Pelles C utters "internal errors" without specifying what's wrong, so it's the usual C mess. But try your luck.

And now I start to understand your reasons. Probably they use a port of GCC for
Windows. I'm not going to ask you to write the CompressBuffer function, there is
no reason for that.
If you need to know what compiler they used, for any reason, I think it is quite easy
for you to find out. You know that better than me.  :t

Quote from: jj2007 on December 16, 2012, 02:42:41 PM
Quote from: Donkey on December 16, 2012, 02:15:47 PMFor repeat characters any 3 repeated characters or greater can be replaced by 0000000:nRepeats:char where nRepeats would be 6 bits preceded by 0 (allowing up to 63 repeats) and char would be 7 bits.

Go a step further and set the rule "everything above 127 is an index into the dictionary of frequently used words"

So the three-byte string chr$(3+128, 32, 7+128) becomes <invoke lstrcpy>, 14 bytes

1 mov
2 add
3 invoke
4 proc
5 endp
6 call
7 lstrcpy

This is more or less the approach of archivers nowadays - find long frequently used byte sequences.

Do you remember the simple ASCII program I wrote 2 years ago?
One of the routine drew boxes on the screen, and another displayed
all the numbers from 1 to 255 with the chars associated. In that case a couple of
procs created data about 1200*4 bytes long, and the routines themselves
where not bigger than 100 bytes.

In my opinion if you use a standard optimized algo you can have some nice
performances in the average cases, but sometime you have to decide alternative
ways, if you don't mind doing it and are not in a hurry. 
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: qWord on December 16, 2012, 04:24:01 PM
Quote from: jj2007 on December 16, 2012, 02:36:06 PMYes, the source is available here (http://netcologne.dl.sourceforge.net/project/sevenzip/LZMA%20SDK/lzma920.tar.bz2). If you can convince your C compiler to translate LZMA\C\Util\Lzma\LzmaUtil.c into a DLL or LIB, then I will offer your the CompressBuffer function.

My Visual Studio 2010 refuses to convert any "project" older than two years or so, sometimes with loads of cryptic error messages, sometimes with a long silence. Pelles C utters "internal errors" without specifying what's wrong, so it's the usual C mess. But try your luck.
you need to compile also the modules in LZMA\C. Compiled, but not tested version in the attachment.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: hutch-- on December 16, 2012, 05:55:56 PM
If general purpose compression is what you are after and you are not trying to design this from scratch yourself, have a look at the masm32 example that uses Jibz's aPlib compression library. A quick scruffy test on WINDOWS.INC compressed it from 977426 bytes down to 207882 bytes. It is primarily designed for binary compression but works fine on plain text.

exampl02\appack\appack.asm

Its all there and works.

Here is a link to Jibz's site where you can download his compression software legally and for free.

http://www.ibsensoftware.com/index.html

Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 16, 2012, 08:28:09 PM
Quote from: qWord on December 16, 2012, 04:24:01 PM
you need to compile also the modules in LZMA\C. Compiled, but not tested version in the attachment.
Thanks. With PeView, it throws an exception, unfortunately.

In the meantime, I could build the lib (attached) with Pelles C with some "warning #2117: Old-style function definition for 'GetError'", plus one that looks a bit more serious (see attachment):
D:\LZMA_SDK\C\Threads.c(37): warning #2018: Undeclared function '_beginthreadex'; assuming 'extern' returning 'int'. But I have no clue how to get from there to the \util result. There are two makefiles around, see attachment, but Pelles C help does not know the word "makefile" ::)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 16, 2012, 08:41:42 PM
Quote from: hutch-- on December 16, 2012, 05:55:56 PM.. Jibz's aPlib compression library. A quick scruffy test on WINDOWS.INC compressed it from 977426 bytes down to 207882 bytes.

Hutch,

Sure it works, but as a forum of "aliens" we are morally obliged to achieve the best compression rate the world has ever seen, with small & fast assembler code of course ;-)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 16, 2012, 09:35:11 PM
aliens - you mean like sergiu...
(http://imageshack.us/a/img257/3711/sergiu.jpg)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 17, 2012, 02:57:16 AM
Quote from: hutch-- on December 16, 2012, 05:55:56 PM
If general purpose compression is what you are after and you are not trying to design this from scratch yourself, have a look at the masm32 example that uses Jibz's aPlib compression library. A quick scruffy test on WINDOWS.INC compressed it from 977426 bytes down to 207882 bytes. It is primarily designed for binary compression but works fine on plain text.

exampl02\appack\appack.asm

Its all there and works.

Here is a link to Jibz's site where you can download his compression software legally and for free.

http://www.ibsensoftware.com/index.html



Thanks Hutch, I suspected there was something like that packed inside MASM32.

Quote from: jj2007 on December 16, 2012, 08:41:42 PM

Hutch,

Sure it works, but as a forum of "aliens" we are morally obliged to achieve the best compression rate the world has ever seen, with small & fast assembler code of course ;-)

:dazzled:  :t

Quote from: dedndave on December 16, 2012, 09:35:11 PM
aliens - you mean like sergiu...
(http://imageshack.us/a/img257/3711/sergiu.jpg)

:greenclp:
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 17, 2012, 04:05:06 AM
don't know if you remember him, Frank
he promised us 100 to 1 compression - lol

http://www.masmforum.com/board/index.php?topic=13454.0 (http://www.masmforum.com/board/index.php?topic=13454.0)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: wjr on December 17, 2012, 05:31:21 AM
Quote from: jj2007 on December 16, 2012, 08:28:09 PM
Thanks. With PeView, it throws an exception, unfortunately.

The file LzmaUtil.dll has long section names (debug related) which, although allowed in OBJ files, shouldn't be making it into an EXE/DLL file (according to the Microsoft specs).

I fixed this in the latest version of PEview 0.9.9 which does not throw an exception (however, it does not display these long section names).
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 17, 2012, 06:20:22 AM
Quote from: dedndave on December 17, 2012, 04:05:06 AM
don't know if you remember him, Frank
he promised us 100 to 1 compression - lol

http://www.masmforum.com/board/index.php?topic=13454.0 (http://www.masmforum.com/board/index.php?topic=13454.0)

Yes Dave, I remember him. I also partially agreed with him because
in some conditions you really can have a compress ratio of 100:1, but
I didn't buy the whole story, of course.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 17, 2012, 08:03:30 AM
Quote from: wjr on December 17, 2012, 05:31:21 AM
I fixed this in the latest version of PEview 0.9.9 which does not throw an exception (however, it does not display these long section names).

Hi Wayne,
Thanks for this. Do you have a download link for the binary? Google has no clue :bgrin:
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 17, 2012, 08:28:50 AM
http://wjradburn.com/software/ (http://wjradburn.com/software/)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: qWord on December 17, 2012, 08:29:37 AM
Quote from: wjr on December 17, 2012, 05:31:21 AMThe file LzmaUtil.dll has long section names (debug related) which, although allowed in OBJ files, shouldn't be making it into an EXE/DLL file (according to the Microsoft specs).
gcc - you get what you pay for  :icon_mrgreen:
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: hutch-- on December 17, 2012, 08:32:44 AM
 :biggrin:

> gcc - you get what you pay for  :icon_mrgreen:

:P
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 17, 2012, 09:09:06 AM
Quote from: qWord on December 17, 2012, 08:29:37 AM
gcc - you get what you pay for  :icon_mrgreen:

Sorry qWord, I didn't get it. Do you mean gcc, that is free, is better than vs or
the contrary?   ::)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: qWord on December 17, 2012, 09:48:28 AM
nice ... you need to use the good documented linker option "-disable-long-section-names" (or for gcc "-Xlinker -disable-long-section-names") to get a standard conform PE.
:icon_rolleyes:
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 17, 2012, 10:03:30 AM
Quote from: dedndave on December 17, 2012, 08:28:50 AM
http://wjradburn.com/software/ (http://wjradburn.com/software/)

Merci :t

In the meantime, I found out that the part that I managed to compile (i.e. the LZMA library itself) is fully sufficient to write an archiver. Attached the first beta (Disclaimer: YOU HAVE BEEN WARNED!). It compresses around 16% better than WinZip with the non-portable maximum option.

Usage:
1. Compression: Drag a file over LzmaJJ.exe
2. Decompression: Drag a *.lzma file over LzmaJJ.exe

Limitations: 1. Only one file, 2. You need Pelles C to assemble the source (see OPT_DebugL in source code).
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: MichaelW on December 17, 2012, 10:11:54 PM
I have successfully compiled ARITH-N.C and the supporting code from the second edition of the Data Compression Book, using the VC Toolkit 2003 compiler with no optimization specified. The code is older than I remembered, using old-style declarations and recommending a LARGE memory model. It compiled as a Win32 console app, but with /W4 the compiler issued a bunch of warnings.

This is my preliminary result, compressing the current MASM32 windows.inc using order 3 arithmetic coding, manually timed at ~10 seconds:

Compressing \masm32\include\windows.inc to winc.z
Using Adaptive order n model with arithmetic coding
................................................................................
............
Input bytes:        977412
Output bytes:       188323
Compression ratio:  81%


Now to see if it will expand...

Looks OK, and FC /B found no differences, but the time was ~9 seconds?

With /O2 /G6 optimization compession and expansion each took ~~3 seconds.

To give you an idea of the code complexity, ARITH-N.C is 1102 lines with ~~25% comments.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 17, 2012, 11:40:12 PM
Well Michael, considering that you have used a C compiled code, the
result is not bad at all, both in speed and compression ratio.
Probably hand written optimized assembly, with new SSE2/3  opcodes could produce
much better results on the speed side at least.
:t
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 17, 2012, 11:45:46 PM
Quote from: jj2007 on December 17, 2012, 10:03:30 AM

In the meantime, I found out that the part that I managed to compile (i.e. the LZMA library itself) is fully sufficient to write an archiver. Attached the first beta (Disclaimer: YOU HAVE BEEN WARNED!). It compresses around 16% better than WinZip with the non-portable maximum option.

Usage:
1. Compression: Drag a file over LzmaJJ.exe
2. Decompression: Drag a *.lzma file over LzmaJJ.exe

Limitations: 1. Only one file, 2. You need Pelles C to assemble the source (see OPT_DebugL in source code).

It seems to work fine, Jochen :t

One thing puzzles me. Your implementatio of RtlCompress function produces
a zip file 280k long, but if I do select --> sent to --> compressed folder it
produces a 190k zip file. Shouldn't them have the same compression rate?
Isn't it the same function Windows is using?  ::)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: mineiro on December 17, 2012, 11:49:05 PM
The most easy paper that I have found to understand data compression is below. It's nice because the author speaks about the logic, instead of throw mathematical hierogliphs. Good introduction.

http://www.fadden.com/techmisc/hdc/index.htm
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: frktons on December 17, 2012, 11:55:18 PM
Quote from: mineiro on December 17, 2012, 11:49:05 PM
The most easy paper that I have found to understand data compression is below. It's nice because the author speaks about the logic, instead of throw mathematical hierogliphs. Good introduction.

http://www.fadden.com/techmisc/hdc/index.htm


Thanks a lot mineiro, this will be certainly useful along the
way. :t
(http://masm32.com/board/index.php?action=dlattach;topic=1091.0;attach=927)
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: jj2007 on December 18, 2012, 01:42:02 AM
Quote from: frktons on December 17, 2012, 11:45:46 PM
It seems to work fine, Jochen :t
Thanks. One minor correction for the source posted in reply #77:
/LIBPATH:"%ProgramFiles%\PellesC\Lib"
Quotes are needed for those working with an English Windows version.

QuoteOne thing puzzles me. Your implementatio of RtlCompress function produces
a zip file 280k long, but if I do select --> sent to --> compressed folder it
produces a 190k zip file. Shouldn't them have the same compression rate?
Isn't it the same function Windows is using?  ::)

That's a known issue. Microsoft uses different algos for the two functions.
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: dedndave on December 18, 2012, 03:26:08 AM
you can use the windows zipper by shell'ing to rundll32, i think (probably via WMI, as well)
but, it only works on files - not buffers
i suppose you could make a temp file and do it that way

i guess windows 8 has a new method - re: CreateCompressor, et al
seems kinda useless
Title: Re: How difficult is it to build a shrinking/deflating routine?
Post by: MichaelW on December 18, 2012, 06:44:00 AM
I could not understand how the code could work correctly but have a lower expansion rate than the statistics in the book listed, but I now see that I somehow misread the statistics. For order 3 arithmetic compression and text files the listed compression/expansion rates were 1206 and 1173. Order 3 produced the highest compression/expansion rates and the highest compression ratio for text and executables, but order 2 produced a slightly higher compression ratio for graphics.