Author Topic: It's Time to build a Text File Compressor in Masm32  (Read 20076 times)

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: It's Time to build a Text File Compressor in Masm32
« Reply #15 on: December 18, 2012, 12:47:26 PM »
the console app looks nice
but, wouldn't it be easier to make it a GUI app ?
no messing with all those line chars and the console window bugs

frktons

  • Member
  • ****
  • Posts: 512
Re: It's Time to build a Text File Compressor in Masm32
« Reply #16 on: December 18, 2012, 02:05:14 PM »
the console app looks nice
but, wouldn't it be easier to make it a GUI app ?
no messing with all those line chars and the console window bugs


Actually I refuse to learn GUI stuff, and I prefer to code in old console style.
There are only a bounce of APIs for the console, against the hundreds for
the GUI. Maybe in the future. For the time being I have too many things to learn
and too short time.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

Donkey

  • Member
  • **
  • Posts: 202
  • ASS-embler
    • Donkey's Stable
Re: It's Time to build a Text File Compressor in Masm32
« Reply #17 on: December 18, 2012, 02:53:50 PM »
Holy cow, I've been reading through a few papers on dictionary algorithms specifically the LZ adaptive dictionary-based group of algorithms. Its going to take a while to wrap my head around this stuff, I have quite a bit of studying to do before I can add much to the discussion but I think a permutation of the LZFG algorithm is the way to go. I'm currently reading through the following

http://wisdombasedcomputing.com/vol1issue3december2011/paper34.pdf
http://everything2.com/title/suffix+tree
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

frktons

  • Member
  • ****
  • Posts: 512
Re: It's Time to build a Text File Compressor in Masm32
« Reply #18 on: December 18, 2012, 03:18:49 PM »
Holy cow, I've been reading through a few papers on dictionary algorithms specifically the LZ adaptive dictionary-based group of algorithms. Its going to take a while to wrap my head around this stuff, I have quite a bit of studying to do before I can add much to the discussion but I think a permutation of the LZFG algorithm is the way to go. I'm currently reading through the following

http://wisdombasedcomputing.com/vol1issue3december2011/paper34.pdf
http://everything2.com/title/suffix+tree

Probably the attachment to the first post of this thread could be interesting
for your studies then.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

  • Member
  • ****
  • Posts: 512
Re: It's Time to build a Text File Compressor in Masm32
« Reply #19 on: December 19, 2012, 02:19:54 PM »
Added some code for Menu Management.

Actually the working keys are:

F1 / H for Help
ESC / E for Exit
Arrows to move from a menu item to another
PagUP/Down first/last menu item
Home/End as above
Numbers / PAD-Numbers to select corresponding menu item
C - Copies the text from the screen displayed into the clipboard
S - Saves the screen with its own format and colors
V - Partially implemented to view screen file for the time being

As usual the update is in the first post.
 
The procs for loading / scanning / comparing files are under construction.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: It's Time to build a Text File Compressor in Masm32
« Reply #20 on: December 19, 2012, 05:29:57 PM »
As usual the update is in the first post.

Hi Frank,
Could you use archives with folder names, e.g. \Masm32\Misc\Compressor? With lots of files, it's a nuisance to look every time for the right folder in WinZip etc...
Thanks,
JJ

frktons

  • Member
  • ****
  • Posts: 512
Re: It's Time to build a Text File Compressor in Masm32
« Reply #21 on: December 19, 2012, 08:27:53 PM »

Hi Frank,
Could you use archives with folder names, e.g. \Masm32\Misc\Compressor? With lots of files, it's a nuisance to look every time for the right folder in WinZip etc...
Thanks,
JJ

Hi Jochen,

Yes we can.  :t
From next update.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

TouEnMasm

  • Member
  • *****
  • Posts: 1764
    • EditMasm
Re: It's Time to build a Text File Compressor in Masm32
« Reply #22 on: December 20, 2012, 03:27:12 AM »
Fa is a musical note to play with CL

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: It's Time to build a Text File Compressor in Masm32
« Reply #23 on: December 20, 2012, 03:29:30 AM »
and, a few posts down from that one   :P

frktons

  • Member
  • ****
  • Posts: 512
Re: It's Time to build a Text File Compressor in Masm32
« Reply #24 on: December 20, 2012, 03:46:55 AM »
Having a cab.exe is the dream I was trying to avoid.
Maybe the sources could be more interesting.

Anyway, some updates for TFSC:

- added mouse to manage the menu
- allocated 2 buffers 1 mb each for file comparing or storing/compressing.
- added routine [to be completed] for file comparing.

Frank
« Last Edit: December 20, 2012, 01:25:07 PM by frktons »
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

FORTRANS

  • Member
  • *****
  • Posts: 1238
Re: It's Time to build a Text File Compressor in Masm32
« Reply #25 on: December 20, 2012, 08:47:06 AM »
Holy cow, I've been reading through a few papers on dictionary algorithms specifically the LZ adaptive dictionary-based group of algorithms. Its going to take a while to wrap my head around this stuff, I have quite a bit of studying to do before I can add much to the discussion but I think a permutation of the LZFG algorithm is the way to go. I'm currently reading through the following

http://wisdombasedcomputing.com/vol1issue3december2011/paper34.pdf
http://everything2.com/title/suffix+tree

Hi,

   Had not heard of LZFG.  Its performance is surprisingly good
according to that paper.  Sounds complex though from their
comments.

Thanks,

Steve

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: It's Time to build a Text File Compressor in Masm32
« Reply #26 on: January 07, 2013, 08:34:34 AM »
deleted
« Last Edit: February 25, 2022, 06:12:18 AM by nidud »

frktons

  • Member
  • ****
  • Posts: 512
Re: It's Time to build a Text File Compressor in Masm32
« Reply #27 on: January 07, 2013, 10:29:39 AM »
Not shore how you going to attack this, but I assume you need to create a token list of equal strings, so here is something to play with:
Code: [Select]
include io.inc
include stdio.inc

MINSTRING equ 3

.data

input label byte
incbin <srctxt>
isize equ $ - offset input

token db isize dup(?)
tokenz dd ?
maxlen dd ?

.code

longest_match:
xor eax,eax ; find the longest string
mov ebx,eax ; return:
mov edx,eax ;  EBX: length of string
lea edi,token ;  EDX: offset in token buffer
mov ecx,tokenz
cmp ecx,1
ja scan
ret
      scan:
mov     al,[esi]
repnz scasb
je @F
ret
      @@:
      push edi
push esi
push ecx
inc esi
repe cmpsb
je @F
dec edi
dec esi
      @@:
pop ecx
mov     eax,esi
pop esi
pop edi
sub eax,esi
cmp eax,ebx
jb @F
mov ebx,eax
mov edx,edi
dec edx
      @@:
      cmp ecx,1
ja scan
ret

tokenize:
mov esi,offset input
mov edi,offset token
mov ecx,MINSTRING
mov tokenz,ecx
rep movsb
    tokenize_loop:
      call longest_match
cmp ebx,MINSTRING
jae tokenize_match
mov eax,tokenz
mov edi,offset token
mov ecx,edi
cmp esi,ecx
ja tokenize_end
add edi,eax
sub ecx,esi
cmp ecx,MINSTRING
jb tokenize_break
mov ecx,MINSTRING
add tokenz,ecx
add eax,ecx
cmp eax,isize
jae tokenize_rest
rep movsb
jmp tokenize_loop
    tokenize_match:
      cmp ebx,maxlen
jb @F
mov maxlen,ebx
      @@:
mov ecx,offset token
cmp esi,ecx
ja tokenize_end
sub ecx,esi
cmp ecx,ebx
jb tokenize_break
add esi,ebx
jmp tokenize_loop
    tokenize_break:
mov edi,tokenz
add edi,offset token
    tokenize_rest:
    rep movsb
    tokenize_end:
ret

main proc c uses esi edi ebx
; mov [eax],eax
call tokenize
invoke printf,"\ninput:\t%7d\noutput:\t%7d\nmaxlen:\t%7d\n",isize,tokenz,maxlen
.if osopen("token",_A_NORMAL, M_WRONLY, A_CREATETRUNC) != -1
    mov esi,eax
    invoke oswrite,esi,addr token,tokenz
    invoke close,esi
.endif
sub eax,eax
ret
main endp

end

The dictionary using minimum 3 byte length is 15258 for the Data_Compression.txt file

Code: [Select]
input: 253568
output:   15258
maxlen:      20

The overall project is targeted at implementing the usual algos:
Huffman, LZ, LZW, Arithmetic... and see, afterwards, how to
create something new, and hopefully faster or with a superior
compression ratio.
Faster could be possible thanks to some Assembly tricks, superior
as compression ratio will depend on many things, actually not tested.

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: It's Time to build a Text File Compressor in Masm32
« Reply #28 on: January 07, 2013, 10:45:08 PM »
deleted
« Last Edit: February 25, 2022, 06:11:51 AM by nidud »

frktons

  • Member
  • ****
  • Posts: 512
Re: It's Time to build a Text File Compressor in Masm32
« Reply #29 on: January 08, 2013, 12:47:50 AM »
This is one way of reading bits from the input stream:

Code: [Select]
.data

bb dd ? ; bit buffer
bk db ? ; number of bits in bb
ios_i dd ? ; index in input stream

.code

getbits:
cmp bk,al
jb @F
mov cl,al
mov eax,1 ; create mask (a mask table is maybe better..)
shl eax,cl
dec eax
and eax,bb ; bits to EAX
sub bk,cl ; dec bit count
shr bb,cl ; dump used bits
inc cl ; set ZF flag
ret
      @@:
push eax ; add a byte to bb
mov eax,ios_i
cmp eax,isize
je @F ; eof..
inc ios_i
add eax,offset input
movzx eax,byte ptr [eax]
mov cl,bk
shl eax,cl
or bb,eax
add bk,8
pop eax
jmp getbits
      @@:
pop eax
    ret


Thanks nidud, these suggestions will soon be useful for
the compression task.  :t
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama