It's Time to build a Text File Compressor in Masm32

dedndave · December 18, 2012, 12:47:26 PM

the console app looks nice
but, wouldn't it be easier to make it a GUI app ?
no messing with all those line chars and the console window bugs

frktons · December 18, 2012, 02:05:14 PM

Quote from: dedndave on December 18, 2012, 12:47:26 PM
the console app looks nice
but, wouldn't it be easier to make it a GUI app ?
no messing with all those line chars and the console window bugs

Actually I refuse to learn GUI stuff, and I prefer to code in old console style.
There are only a bounce of APIs for the console, against the hundreds for
the GUI. Maybe in the future. For the time being I have too many things to learn
and too short time.

Donkey · December 18, 2012, 02:53:50 PM

Holy cow, I've been reading through a few papers on dictionary algorithms specifically the LZ adaptive dictionary-based group of algorithms. Its going to take a while to wrap my head around this stuff, I have quite a bit of studying to do before I can add much to the discussion but I think a permutation of the LZFG algorithm is the way to go. I'm currently reading through the following

http://wisdombasedcomputing.com/vol1issue3december2011/paper34.pdf
http://everything2.com/title/suffix+tree

frktons · December 18, 2012, 03:18:49 PM

Quote from: Donkey on December 18, 2012, 02:53:50 PM
Holy cow, I've been reading through a few papers on dictionary algorithms specifically the LZ adaptive dictionary-based group of algorithms. Its going to take a while to wrap my head around this stuff, I have quite a bit of studying to do before I can add much to the discussion but I think a permutation of the LZFG algorithm is the way to go. I'm currently reading through the following

http://wisdombasedcomputing.com/vol1issue3december2011/paper34.pdf
http://everything2.com/title/suffix+tree

Probably the attachment to the first post of this thread could be interesting
for your studies then.

frktons · December 19, 2012, 02:19:54 PM

Added some code for Menu Management.

Actually the working keys are:

F1 / H for Help
ESC / E for Exit
Arrows to move from a menu item to another
PagUP/Down first/last menu item
Home/End as above
Numbers / PAD-Numbers to select corresponding menu item
C - Copies the text from the screen displayed into the clipboard
S - Saves the screen with its own format and colors
V - Partially implemented to view screen file for the time being

As usual the update is in the first post.

The procs for loading / scanning / comparing files are under construction.

jj2007 · December 19, 2012, 05:29:57 PM

Quote from: frktons on December 19, 2012, 02:19:54 PM
As usual the update is in the first post.

Hi Frank,
Could you use archives with folder names, e.g. \Masm32\Misc\Compressor? With lots of files, it's a nuisance to look every time for the right folder in WinZip etc...
Thanks,
JJ

frktons · December 19, 2012, 08:27:53 PM

Quote from: jj2007 on December 19, 2012, 05:29:57 PM

Hi Frank,
Could you use archives with folder names, e.g. \Masm32\Misc\Compressor? With lots of files, it's a nuisance to look every time for the right folder in WinZip etc...
Thanks,
JJ

Hi Jochen,

Yes we can. :t
From next update.

TouEnMasm · December 20, 2012, 03:27:12 AM

This one is still in the http://www.masmforum.com/board/index.php?topic=15470.msg127506#msg127506 old forum.

dedndave · December 20, 2012, 03:29:30 AM

and, a few posts down from that one :P

frktons · December 20, 2012, 03:46:55 AM

Having a cab.exe is the dream I was trying to avoid.
Maybe the sources could be more interesting.

Anyway, some updates for TFSC:

- added mouse to manage the menu
- allocated 2 buffers 1 mb each for file comparing or storing/compressing.
- added routine [to be completed] for file comparing.

Frank

FORTRANS · December 20, 2012, 08:47:06 AM

Quote from: Donkey on December 18, 2012, 02:53:50 PM
Holy cow, I've been reading through a few papers on dictionary algorithms specifically the LZ adaptive dictionary-based group of algorithms. Its going to take a while to wrap my head around this stuff, I have quite a bit of studying to do before I can add much to the discussion but I think a permutation of the LZFG algorithm is the way to go. I'm currently reading through the following

http://wisdombasedcomputing.com/vol1issue3december2011/paper34.pdf
http://everything2.com/title/suffix+tree

Hi,

Had not heard of LZFG. Its performance is surprisingly good
according to that paper. Sounds complex though from their
comments.

Thanks,

Steve

nidud · January 07, 2013, 08:34:34 AM

deleted

frktons · January 07, 2013, 10:29:39 AM

Quote from: nidud on January 07, 2013, 08:34:34 AM
Not shore how you going to attack this, but I assume you need to create a token list of equal strings, so here is something to play with:
Code Select Expand
include io.inc include stdio.inc MINSTRING equ 3 .data input label byte incbin <srctxt> isize equ $ - offset input token db isize dup(?) tokenz dd ? maxlen dd ? .code longest_match: xor eax,eax ; find the longest string mov ebx,eax ; return: mov edx,eax ; EBX: length of string lea edi,token ; EDX: offset in token buffer mov ecx,tokenz cmp ecx,1 ja scan ret scan: mov al,[esi] repnz scasb je @F ret @@: push edi push esi push ecx inc esi repe cmpsb je @F dec edi dec esi @@: pop ecx mov eax,esi pop esi pop edi sub eax,esi cmp eax,ebx jb @F mov ebx,eax mov edx,edi dec edx @@: cmp ecx,1 ja scan ret tokenize: mov esi,offset input mov edi,offset token mov ecx,MINSTRING mov tokenz,ecx rep movsb tokenize_loop: call longest_match cmp ebx,MINSTRING jae tokenize_match mov eax,tokenz mov edi,offset token mov ecx,edi cmp esi,ecx ja tokenize_end add edi,eax sub ecx,esi cmp ecx,MINSTRING jb tokenize_break mov ecx,MINSTRING add tokenz,ecx add eax,ecx cmp eax,isize jae tokenize_rest rep movsb jmp tokenize_loop tokenize_match: cmp ebx,maxlen jb @F mov maxlen,ebx @@: mov ecx,offset token cmp esi,ecx ja tokenize_end sub ecx,esi cmp ecx,ebx jb tokenize_break add esi,ebx jmp tokenize_loop tokenize_break: mov edi,tokenz add edi,offset token tokenize_rest: rep movsb tokenize_end: ret main proc c uses esi edi ebx ; mov [eax],eax call tokenize invoke printf,"\ninput:\t%7d\noutput:\t%7d\nmaxlen:\t%7d\n",isize,tokenz,maxlen .if osopen("token",_A_NORMAL, M_WRONLY, A_CREATETRUNC) != -1 mov esi,eax invoke oswrite,esi,addr token,tokenz invoke close,esi .endif sub eax,eax ret main endp end

The dictionary using minimum 3 byte length is 15258 for the Data_Compression.txt file

Code Select Expand
input: 253568 output: 15258 maxlen: 20

The overall project is targeted at implementing the usual algos:
Huffman, LZ, LZW, Arithmetic... and see, afterwards, how to
create something new, and hopefully faster or with a superior
compression ratio.
Faster could be possible thanks to some Assembly tricks, superior
as compression ratio will depend on many things, actually not tested.

nidud · January 07, 2013, 10:45:08 PM

deleted

frktons · January 08, 2013, 12:47:50 AM

Quote from: nidud on January 07, 2013, 10:45:08 PM
This is one way of reading bits from the input stream:

Code Select Expand
.data bb dd ? ; bit buffer bk db ? ; number of bits in bb ios_i dd ? ; index in input stream .code getbits: cmp bk,al jb @F mov cl,al mov eax,1 ; create mask (a mask table is maybe better..) shl eax,cl dec eax and eax,bb ; bits to EAX sub bk,cl ; dec bit count shr bb,cl ; dump used bits inc cl ; set ZF flag ret @@: push eax ; add a byte to bb mov eax,ios_i cmp eax,isize je @F ; eof.. inc ios_i add eax,offset input movzx eax,byte ptr [eax] mov cl,bk shl eax,cl or bb,eax add bk,8 pop eax jmp getbits @@: pop eax ret

Thanks nidud, these suggestions will soon be useful for
the compression task. :t

The MASM Forum

News:

It's Time to build a Text File Compressor in Masm32

dedndave

frktons

Donkey

frktons

frktons

jj2007

frktons

TouEnMasm

dedndave

frktons

FORTRANS

nidud

frktons

nidud

frktons