News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

It's Time to build a Text File Compressor in Masm32

Started by frktons, December 18, 2012, 04:20:28 AM

Previous topic - Next topic

dedndave

the console app looks nice
but, wouldn't it be easier to make it a GUI app ?
no messing with all those line chars and the console window bugs

frktons

Quote from: dedndave on December 18, 2012, 12:47:26 PM
the console app looks nice
but, wouldn't it be easier to make it a GUI app ?
no messing with all those line chars and the console window bugs


Actually I refuse to learn GUI stuff, and I prefer to code in old console style.
There are only a bounce of APIs for the console, against the hundreds for
the GUI. Maybe in the future. For the time being I have too many things to learn
and too short time.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

Donkey

Holy cow, I've been reading through a few papers on dictionary algorithms specifically the LZ adaptive dictionary-based group of algorithms. Its going to take a while to wrap my head around this stuff, I have quite a bit of studying to do before I can add much to the discussion but I think a permutation of the LZFG algorithm is the way to go. I'm currently reading through the following

http://wisdombasedcomputing.com/vol1issue3december2011/paper34.pdf
http://everything2.com/title/suffix+tree
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

frktons

Quote from: Donkey on December 18, 2012, 02:53:50 PM
Holy cow, I've been reading through a few papers on dictionary algorithms specifically the LZ adaptive dictionary-based group of algorithms. Its going to take a while to wrap my head around this stuff, I have quite a bit of studying to do before I can add much to the discussion but I think a permutation of the LZFG algorithm is the way to go. I'm currently reading through the following

http://wisdombasedcomputing.com/vol1issue3december2011/paper34.pdf
http://everything2.com/title/suffix+tree

Probably the attachment to the first post of this thread could be interesting
for your studies then.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

Added some code for Menu Management.

Actually the working keys are:

F1 / H for Help
ESC / E for Exit
Arrows to move from a menu item to another
PagUP/Down first/last menu item
Home/End as above
Numbers / PAD-Numbers to select corresponding menu item
C - Copies the text from the screen displayed into the clipboard
S - Saves the screen with its own format and colors
V - Partially implemented to view screen file for the time being

As usual the update is in the first post.

The procs for loading / scanning / comparing files are under construction.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

Quote from: frktons on December 19, 2012, 02:19:54 PM
As usual the update is in the first post.

Hi Frank,
Could you use archives with folder names, e.g. \Masm32\Misc\Compressor? With lots of files, it's a nuisance to look every time for the right folder in WinZip etc...
Thanks,
JJ

frktons

Quote from: jj2007 on December 19, 2012, 05:29:57 PM

Hi Frank,
Could you use archives with folder names, e.g. \Masm32\Misc\Compressor? With lots of files, it's a nuisance to look every time for the right folder in WinZip etc...
Thanks,
JJ

Hi Jochen,

Yes we can.  :t
From next update.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama


dedndave


frktons

#24
Having a cab.exe is the dream I was trying to avoid.
Maybe the sources could be more interesting.

Anyway, some updates for TFSC:

- added mouse to manage the menu
- allocated 2 buffers 1 mb each for file comparing or storing/compressing.
- added routine [to be completed] for file comparing.

Frank
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

FORTRANS

Quote from: Donkey on December 18, 2012, 02:53:50 PM
Holy cow, I've been reading through a few papers on dictionary algorithms specifically the LZ adaptive dictionary-based group of algorithms. Its going to take a while to wrap my head around this stuff, I have quite a bit of studying to do before I can add much to the discussion but I think a permutation of the LZFG algorithm is the way to go. I'm currently reading through the following

http://wisdombasedcomputing.com/vol1issue3december2011/paper34.pdf
http://everything2.com/title/suffix+tree

Hi,

   Had not heard of LZFG.  Its performance is surprisingly good
according to that paper.  Sounds complex though from their
comments.

Thanks,

Steve

nidud

#26
deleted

frktons

Quote from: nidud on January 07, 2013, 08:34:34 AM
Not shore how you going to attack this, but I assume you need to create a token list of equal strings, so here is something to play with:

include io.inc
include stdio.inc

MINSTRING equ 3

.data

input label byte
incbin <srctxt>
isize equ $ - offset input

token db isize dup(?)
tokenz dd ?
maxlen dd ?

.code

longest_match:
xor eax,eax ; find the longest string
mov ebx,eax ; return:
mov edx,eax ;  EBX: length of string
lea edi,token ;  EDX: offset in token buffer
mov ecx,tokenz
cmp ecx,1
ja scan
ret
      scan:
mov     al,[esi]
repnz scasb
je @F
ret
      @@:
      push edi
push esi
push ecx
inc esi
repe cmpsb
je @F
dec edi
dec esi
      @@:
pop ecx
mov     eax,esi
pop esi
pop edi
sub eax,esi
cmp eax,ebx
jb @F
mov ebx,eax
mov edx,edi
dec edx
      @@:
      cmp ecx,1
ja scan
ret

tokenize:
mov esi,offset input
mov edi,offset token
mov ecx,MINSTRING
mov tokenz,ecx
rep movsb
    tokenize_loop:
      call longest_match
cmp ebx,MINSTRING
jae tokenize_match
mov eax,tokenz
mov edi,offset token
mov ecx,edi
cmp esi,ecx
ja tokenize_end
add edi,eax
sub ecx,esi
cmp ecx,MINSTRING
jb tokenize_break
mov ecx,MINSTRING
add tokenz,ecx
add eax,ecx
cmp eax,isize
jae tokenize_rest
rep movsb
jmp tokenize_loop
    tokenize_match:
      cmp ebx,maxlen
jb @F
mov maxlen,ebx
      @@:
mov ecx,offset token
cmp esi,ecx
ja tokenize_end
sub ecx,esi
cmp ecx,ebx
jb tokenize_break
add esi,ebx
jmp tokenize_loop
    tokenize_break:
mov edi,tokenz
add edi,offset token
    tokenize_rest:
    rep movsb
    tokenize_end:
ret

main proc c uses esi edi ebx
; mov [eax],eax
call tokenize
invoke printf,"\ninput:\t%7d\noutput:\t%7d\nmaxlen:\t%7d\n",isize,tokenz,maxlen
.if osopen("token",_A_NORMAL, M_WRONLY, A_CREATETRUNC) != -1
    mov esi,eax
    invoke oswrite,esi,addr token,tokenz
    invoke close,esi
.endif
sub eax,eax
ret
main endp

end


The dictionary using minimum 3 byte length is 15258 for the Data_Compression.txt file


input: 253568
output:   15258
maxlen:      20


The overall project is targeted at implementing the usual algos:
Huffman, LZ, LZW, Arithmetic... and see, afterwards, how to
create something new, and hopefully faster or with a superior
compression ratio.
Faster could be possible thanks to some Assembly tricks, superior
as compression ratio will depend on many things, actually not tested.

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

nidud

#28
deleted

frktons

Quote from: nidud on January 07, 2013, 10:45:08 PM
This is one way of reading bits from the input stream:


.data

bb dd ? ; bit buffer
bk db ? ; number of bits in bb
ios_i dd ? ; index in input stream

.code

getbits:
cmp bk,al
jb @F
mov cl,al
mov eax,1 ; create mask (a mask table is maybe better..)
shl eax,cl
dec eax
and eax,bb ; bits to EAX
sub bk,cl ; dec bit count
shr bb,cl ; dump used bits
inc cl ; set ZF flag
ret
      @@:
push eax ; add a byte to bb
mov eax,ios_i
cmp eax,isize
je @F ; eof..
inc ios_i
add eax,offset input
movzx eax,byte ptr [eax]
mov cl,bk
shl eax,cl
or bb,eax
add bk,8
pop eax
jmp getbits
      @@:
pop eax
    ret



Thanks nidud, these suggestions will soon be useful for
the compression task.  :t
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama