How difficult is it to build a shrinking/deflating routine?

dedndave · December 15, 2012, 01:20:51 PM

http://masm32.com/board/index.php?topic=967.msg8697#msg8697

dedndave · December 15, 2012, 01:27:00 PM

RtlDecompressBuffer
http://msdn.microsoft.com/en-us/library/windows/hardware/ff552191%28v=vs.85%29.aspx

RtlCompressBuffer
http://msdn.microsoft.com/en-us/library/windows/hardware/ff552127%28v=vs.85%29.aspx

ok
use RtlCompressBuffer to create a raw binary
add it to your program as a resource
use the LoadFromRsrc routine i posted above to get it into a buffer
use RtlDecompressBuffer to decompress it

BANG - 50%
which is, you guessed it, $50

of course, you could just create a DB list like you have of the compressed binary
then, you don't need my routine - just put it in the .DATA section

dedndave · December 15, 2012, 01:41:39 PM

you will want to use

Code Select

        INCLUDE    \masm32\include\masm32rt.inc
        INCLUDE    \masm32\include\ntoskrnl.inc
        INCLUDELIB \masm32\lib\ntoskrnl.lib

frktons · December 15, 2012, 01:49:12 PM

Yes Dave, I've thought about these systems as well.

But where is the fun in using precooked meals?

Here we have something that will help in the process of
building our own code. :lol:

dedndave · December 15, 2012, 01:50:35 PM

ok
write your own code
hopefully, it is smaller that the 606 bytes you are trying to save

that's the advantage of using the API function

frktons · December 15, 2012, 01:54:20 PM

Quote from: dedndave on December 15, 2012, 01:50:35 PM
ok
write your own code
hopefully, it is smaller that the 606 bytes you are trying to save
that's the advantage of using the API function

The point is, I'm thinking about multiple long strings, so the code will
earn its $50% many times inside the program.

nidud · December 15, 2012, 01:56:36 PM

deleted

frktons · December 15, 2012, 02:01:37 PM

Hi nidud.

Will you post also the algo to obtain:

Code Select


bcode	db 0,6,5*6	; 'blabalblabalblabalblabalblabal'
	db -3,3,3	; 'bal'
	db -3,2,2	; 'ba'
	db 6,2,2	; 'hn'
	db -4,2,2	; 'ba'
	db 8,3,3	; 'gbg'
	db -2,2,2	; 'bg'
	db 11,3,3	; 'abg'
	db -10,3,3*3	; 'bagbagbag'
	db -56,56,2*56	; repeat line * 2
	db -1		; 45 byte

I miss the compression phase.

dedndave · December 15, 2012, 02:03:58 PM

LZW data compression is not rocket science
basically, you break up the data and replace sections with tokens
in the token table, you put the original "string"
i think the token table can only hold something like ~254 tokens
when it gets full, and you need to create a new token, you trash it and start a new token table
the tokens then replace the strings in the compressed data stream

so - you can write your own if you like
there are also pre-written libraries like gzip, etc
i seem to recall someone making a LIB and INC for gzip a while back
don't remember if it was in the new forum or the old one

frktons · December 15, 2012, 02:09:43 PM

Quote from: dedndave on December 15, 2012, 02:03:58 PM
LZW data compression is not rocket science
so - you can write your own if you like
there are also pre-written libraries like gzip, etc
i seem to recall someone making a LIB and INC for gzip a while back
don't remember if it was the new forum or the old one

Probably it was on the old forum. By the way, as you say I can write a simplified
algo that suits my needs. I've thought about it during the last 2 years, when I
remembered the matter, not very often, but I've got some ideas to try.
And I think writing some code shouldn't hurt in the process of learning a bit of
Assembly :lol:

dedndave · December 15, 2012, 02:12:01 PM

i updated my previous post
LZW compression...

Quotebasically, you break up the data and replace sections with tokens
in the token table, you put the original "string"
i think the token table can only hold something like ~254 tokens
when it gets full, and you need to create a new token, you trash it and start a new token table
the tokens then replace the strings in the compressed data stream

frktons · December 15, 2012, 02:19:16 PM

I think the algo was translated by Jochen:
http://www.masmforum.com/board/index.php?topic=15470.0

Quote from: dedndave on December 15, 2012, 02:12:01 PM
LZW compression...

basically, you break up the data and replace sections with tokens
in the token table, you put the original "string"
i think the token table can only hold something like ~254 tokens
when it gets full, and you need to create a new token, you trash it and start a new token table
the tokens then replace the strings in the compressed data stream

Yes Dave, this is one of the few things I understand about compression methods.
And I think it'll be enough for the time being. :t

nidud · December 15, 2012, 02:19:38 PM

deleted

dedndave · December 15, 2012, 02:32:13 PM

yes - the trick is to pick good strings to tokenize
token selection should match the type of data
that's the key to getting good compression ratios

plain text like this is probably the easiest data type to work with (except for a shit-load of zeros)
it's somewhat predictable :P

frktons · December 15, 2012, 02:35:02 PM

Quote from: nidud on December 15, 2012, 02:19:38 PM

There are many algos to do that
You scan the output buffer for duplicate strings

The most common is to use { WORD offset, BYTE length }
The minimum string length is then 3 byte

The string buffer is also converted to bits:
a = 0
b = 10
l = 1100
h = 1101
n = 1110
g = 1111

bla = 10 1100 0 = 7 bits

OK nidud, if you feel to do it, show me your idea inside the code
I posted, and tell me what % you get shrinking the string.

Quote from: dedndave on December 15, 2012, 02:32:13 PM
yes - the trick is to pick good strings to tokenize
token selection should match the type of data
that's the key to getting good compression ratios

That's also the most difficult thing to do in my opinion.

The MASM Forum

News:

How difficult is it to build a shrinking/deflating routine?

dedndave

dedndave

dedndave

frktons

dedndave

frktons

nidud

frktons

dedndave

frktons

dedndave

frktons

nidud

dedndave

frktons