The MASM Forum

General => The Workshop => Topic started by: frktons on February 03, 2013, 06:54:01 PM

Title: The shadow of a binary tree
Post by: frktons on February 03, 2013, 06:54:01 PM
I never had occasion to build a binary tree. My knowledge about
the matter is quite non-existent.
I only know they have a structure with a root and branches on the left
for items with smaller value, and on the right for bigger values.
Each item has its own value [a number, a string ...] and two pointers
for a lesser value and a bigger one.
How to declare them, how to create them, insert items, delete, move
from a position to another, and all the operations involved, including
searching and balancing the tree are still a mistery.
Considering they could be useful, if anybody has some experience with
them, please give me your opinion regarding the "when" they are useful,
and how to translate theory in ASM code to create, for example, a
binary tree starting from a non-sorted array of random numbers/strings.

Thanks for any help you can give.

Frank


 
Title: Re: The shadow of a binary tree
Post by: qWord on February 04, 2013, 05:26:41 AM
hello,
the English Wikipedia has good articles for the different kind of trees - there you can read up for what they are commonly used. Even if you do not fully understand them currently, it should be no problem for you to translate existing HLL code to ASM. For an example, I use Red-Black trees for huge (sorted) data sets, which has a high search rate and less insertion/deletions. One advantage of them is that you can quickly step through data in (or reversed) order. In context to search trees, you may also take a look at has tables.
Title: Re: The shadow of a binary tree
Post by: frktons on February 04, 2013, 08:19:54 AM
Thanks qWord.

The few things I know about binary trees were collected through wikipedia
and the net. Maybe if I detail the task I need to perform, it'll be more clear
if binary trees are or not the best option.

In my slow proceeding to the text compression task, I've reached the point
where I have to scan the input text file, map the words inside the file, count
the recurrences for each word, and do something else to prepare data for next
steps.

I've thought about arrays, first, but it is probably not the best solution. So after some
research I met binary trees, and thinking for a while about their possible
use, I believed they could be the solution for a fast building of the word
dictionary, scanning the input file only once, and  doing all the connected
operations at the same time: counting, selecting, matching, connecting with other
special tokens, and so on.

Being new in both ASM and binary trees use, I've only a vague idea about the
sequence of operations to define data structure for a binary tree, and use it, while
feeding it with the words found in input text file.

I'll try to write a sample code and after see how to make it better, and consistent
with specifications.
Title: Re: The shadow of a binary tree
Post by: FORTRANS on February 05, 2013, 12:31:47 AM
Hi,

   For LZW compression a hash table is commonly used.
Some people use a tree to do the bookkeeping when they
don't like hashing.  When I wrote my GIF creater, using its
modified LZW compression, I used a simple hash function.
Sort of quick and easy.

   In "The Data Compression Book", LZSS compression
uses a tree to hold the information needed.  So that might
be a resource for you.

Regards,

Steve N.
Title: Re: The shadow of a binary tree
Post by: frktons on February 05, 2013, 12:52:31 AM
Quote from: FORTRANS on February 05, 2013, 12:31:47 AM
Hi,

   For LZW compression a hash table is commonly used.
Some people use a tree to do the bookkeeping when they
don't like hashing.  When I wrote my GIF creater, using its
modified LZW compression, I used a simple hash function.
Sort of quick and easy.

   In "The Data Compression Book", LZSS compression
uses a tree to hold the information needed.  So that might
be a resource for you.

Regards,

Steve N.

Thanks Steve.

Being new to both hash tables and binary trees, I'm not able to
grasp the advantages/disadvantages of the two different approach.

I need some more info before I can decide what's best suited.

Frank
Title: Re: The shadow of a binary tree
Post by: Tedd on February 05, 2013, 02:45:21 AM
A hash table is better suited :P

The performance of a (plain) binary tree is highly dependent on the order that you insert items, and can easily devolve into a linear linked list. For item lookup and frequency counting, the hash table works in fairly constant time for all items; though can require a little more memory than a binary tree of the same items.
Title: Re: The shadow of a binary tree
Post by: frktons on February 05, 2013, 03:26:02 AM
Quote from: Tedd on February 05, 2013, 02:45:21 AM
A hash table is better suited :P

The performance of a (plain) binary tree is highly dependent on the order that you insert items, and can easily devolve into a linear linked list. For item lookup and frequency counting, the hash table works in fairly constant time for all items; though can require a little more memory than a binary tree of the same items.


Thanks Tedd. Good info to think about. Instead of studying binary trees
maybe is better to study hash management  :biggrin:
Title: Re: The shadow of a binary tree
Post by: hutch-- on February 06, 2013, 04:28:32 PM
A hash table is a good solution if you have some idea of how many items you wish to handle. A tree can extend to the limit of memory where a hash table tends to be limited to its original count. With a hash table you need to design a decent method of collision detection but if you get it right you can saturate the table to about 80% before it starts to slow down.

If you can get the mechanics right, the next trick is to choose between a fixed allocation of memory for each slot versus dynamic allocation, the first is far faster but at the cost of inflexibility and high memory usage, the latter is very flexible but slower. With the latter you can make the table much bigger and unless the slot is used it only uses a very small amount of memory.
Title: Re: The shadow of a binary tree
Post by: FORTRANS on February 07, 2013, 12:36:41 AM
Quote from: frktons on February 05, 2013, 12:52:31 AM
I need some more info before I can decide what's best suited.

Hi,

   "The Data Compression Book", by Mark Nelson has a
decent discussion of trees and a little on hash tables.  For
my program I looked at the source for a version of Unix
compress, and some papers fom internet sources.  One
was  LZW and GIF explained by Steve Blackstock.  Another
was  LZW compression used to encode/decode a GIF file by
Bob Montgomery

   For a more thorough discussion, I used "The Art of Computer
Programming", Volume 3/ Searching and Sorting, by Donald
Knuth.  My hash function used a little bit from this (although
misused might also apply).

Regards,

Steve N.
Title: Re: The shadow of a binary tree
Post by: frktons on February 07, 2013, 02:55:29 AM
Quote from: hutch-- on February 06, 2013, 04:28:32 PM
A hash table is a good solution if you have some idea of how many items you wish to handle. A tree can extend to the limit of memory where a hash table tends to be limited to its original count. With a hash table you need to design a decent method of collision detection but if you get it right you can saturate the table to about 80% before it starts to slow down.

If you can get the mechanics right, the next trick is to choose between a fixed allocation of memory for each slot versus dynamic allocation, the first is far faster but at the cost of inflexibility and high memory usage, the latter is very flexible but slower. With the latter you can make the table much bigger and unless the slot is used it only uses a very small amount of memory.

The hash table should have about 64K entries, but it varies depending on
input text file size. Actual memory in PC is big enough for the task, I guess.

Quote from: FORTRANS on February 07, 2013, 12:36:41 AM
Quote from: frktons on February 05, 2013, 12:52:31 AM
I need some more info before I can decide what's best suited.

Hi,

   "The Data Compression Book", by Mark Nelson has a
decent discussion of trees and a little on hash tables.  For
my program I looked at the source for a version of Unix
compress, and some papers fom internet sources.  One
was  LZW and GIF explained by Steve Blackstock.  Another
was  LZW compression used to encode/decode a GIF file by
Bob Montgomery

   For a more thorough discussion, I used "The Art of Computer
Programming", Volume 3/ Searching and Sorting, by Donald
Knuth.  My hash function used a little bit from this (although
misused might also apply).

Regards,

Steve N.

Hi.

Well, I think the speed factor is the most important one.
So beside collisions on hash generation, it depends on how long
it takes to generate hashes on the fly while I'm extracting words
from a text file. If I extract 1 million words, I have to generate
1 million hashes, and confront the words extracted with the words
pointed by the hashes to verify collisions or equality of hash and words.

What can be done to have unique hashes on unique words?

Frank
Title: Re: The shadow of a binary tree
Post by: qWord on February 07, 2013, 03:10:11 AM
Quote from: frktons on February 07, 2013, 02:55:29 AMWhat can be done to have unique hashes on unique words?
the same kind of question: how can I represent numbers between 0 to 10000000 in 8 Bit integers without information loss?
Title: Re: The shadow of a binary tree
Post by: frktons on February 07, 2013, 03:25:47 AM
Quote from: qWord on February 07, 2013, 03:10:11 AM
Quote from: frktons on February 07, 2013, 02:55:29 AMWhat can be done to have unique hashes on unique words?
the same kind of question: how can I represent numbers between 0 to 10000000 in 8 Bit integers without information loss?

So it is impossible. Then we move to next step.  :P
What kind of hash generator can be used to have a reasonable
number of collisions? And if you have a MASM sample code to post,
much better.
Title: Re: The shadow of a binary tree
Post by: FORTRANS on February 07, 2013, 04:35:53 AM
Quote from: frktons on February 07, 2013, 03:25:47 AM
What kind of hash generator can be used to have a reasonable
number of collisions? And if you have a MASM sample code to post,
much better.

Hi,

   Well for the LZW code used for *.GIF files, you need to
have up to 12-bit codes or tokens generated.  You emit
an existing token and a character to create the next token.
So I shift the character left four bits and add in the token to
create an index into the hash table.  You then use the index
to check for an empty slot, a filled slot with the character the
same as the one you are processing, or a filled slot with a
different character, which means a collision.  The code to
create the index is pretty simple.  (Mine is 16-bit though.)
The code that uses the index is not simple.

   You basically need to take your data and fold it up somehow
to create a proper sized index into the hash table.  GIF's just
happen to be relatively small, so you can use a simple algorithm.

Regards,

Steve
Regards,
Title: Re: The shadow of a binary tree
Post by: hutch-- on February 07, 2013, 11:20:41 AM
Frank,

> What can be done to have unique hashes on unique words?

The general trade off with a hash table is to use a much faster hash number generator and handle the collisions efficiently. The alternative is to use a much more complex number generator that is a lot slower but still have to handle a reduced number of collisions.

Next trick is to always set the has table count as a prime number so that any looping around the table never finds a recurring pattern. Rough rule is if you need a given number of items to load into the table, double the table size and you will never get saturation.

By design if you make the basic table as an array of pointers and for each slot use dynamic allocation, you end up with a very efficient table that will handle variable length data with no problems. If alternately you know the maximum length of the data you want to load into the table, you allocated a single block of memory and set your array of pointers to even locations within that memory. The latter is much faster but at the cost of being inflexible.
Title: Re: The shadow of a binary tree
Post by: frktons on February 08, 2013, 01:48:37 AM
Quote from: FORTRANS on February 07, 2013, 04:35:53 AM

Hi,

   Well for the LZW code used for *.GIF files, you need to
have up to 12-bit codes or tokens generated.  You emit
an existing token and a character to create the next token.
So I shift the character left four bits and add in the token to
create an index into the hash table.  You then use the index
to check for an empty slot, a filled slot with the character the
same as the one you are processing, or a filled slot with a
different character, which means a collision.  The code to
create the index is pretty simple.  (Mine is 16-bit though.)
The code that uses the index is not simple.

   You basically need to take your data and fold it up somehow
to create a proper sized index into the hash table.  GIF's just
happen to be relatively small, so you can use a simple algorithm.

Regards,

Steve
Regards,

It'd be nice to see the code you use for generating hashes.
Being new to the matter, it could help me to get started.
;)
Quote from: hutch-- on February 07, 2013, 11:20:41 AM
Frank,


The general trade off with a hash table is to use a much faster hash number generator and handle the collisions efficiently. The alternative is to use a much more complex number generator that is a lot slower but still have to handle a reduced number of collisions.

Next trick is to always set the has table count as a prime number so that any looping around the table never finds a recurring pattern. Rough rule is if you need a given number of items to load into the table, double the table size and you will never get saturation.

By design if you make the basic table as an array of pointers and for each slot use dynamic allocation, you end up with a very efficient table that will handle variable length data with no problems. If alternately you know the maximum length of the data you want to load into the table, you allocated a single block of memory and set your array of pointers to even locations within that memory. The latter is much faster but at the cost of being inflexible.

Yes Steve, good points to consider.   :t
Title: Re: The shadow of a binary tree
Post by: FORTRANS on February 08, 2013, 02:04:23 AM
Hi,


; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hash_rec STRUC
HCode   DW      ?       ; Code of the symbol in slot i, or 0 if empty slot.
HPrefix DW      ?       ; Symbol's prefix code; undefined if empty slot.
HSuffix DB      ?       ; Symbol Suffix character.
Hash_rec        ENDS

HSize   EQU     5120    ; Hash table size for ~80% occupancy.  Not prime
                        ; As AoCP wants.

LZWHash Hash_rec  HSize DUP(<>) ; syntax for a structure.

; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; 27 March 2008, make a hash to index the LZW token array.  Use logic
; similar to Unix COMPRESS code.  Make a separate routine for cleaner
; code, and possibility to use a different algorithm later on.
INDEX:
        PUSH    BX      ; Not sure if this is necessary...
        PUSH    CX      ; But protect yourself while debugging.

        MOV     SI,AX   ; Move character into Source Index register.
        MOV     CX,4
        SHL     SI,CL   ; Shift to make comparable to table size.
        ADD     SI,[Prefix] ; Make hash dependent on both prefix and character.
        CMP     SI,HSize ; Too big?
        JB      Idx1
        SUB     SI,HSize ; Yep.
Idx1:
; Multiply hash number by record size.
        MOV     BX,SI   ; Five bytes per table entry.
        SHL     SI,1
        SHL     SI,1
        ADD     SI,BX
        ADD     SI,OFFSET LZWHash       ; And point into the table

        POP     CX
        POP     BX

        RET
Title: Re: The shadow of a binary tree
Post by: frktons on February 08, 2013, 04:36:21 AM
Quote from: FORTRANS on February 08, 2013, 02:04:23 AM
Hi,


; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Hash_rec STRUC
HCode   DW      ?       ; Code of the symbol in slot i, or 0 if empty slot.
HPrefix DW      ?       ; Symbol's prefix code; undefined if empty slot.
HSuffix DB      ?       ; Symbol Suffix character.
Hash_rec        ENDS

HSize   EQU     5120    ; Hash table size for ~80% occupancy.  Not prime
                        ; As AoCP wants.

LZWHash Hash_rec  HSize DUP(<>) ; syntax for a structure.

; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; 27 March 2008, make a hash to index the LZW token array.  Use logic
; similar to Unix COMPRESS code.  Make a separate routine for cleaner
; code, and possibility to use a different algorithm later on.
INDEX:
        PUSH    BX      ; Not sure if this is necessary...
        PUSH    CX      ; But protect yourself while debugging.

        MOV     SI,AX   ; Move character into Source Index register.
        MOV     CX,4
        SHL     SI,CL   ; Shift to make comparable to table size.
        ADD     SI,[Prefix] ; Make hash dependent on both prefix and character.
        CMP     SI,HSize ; Too big?
        JB      Idx1
        SUB     SI,HSize ; Yep.
Idx1:
; Multiply hash number by record size.
        MOV     BX,SI   ; Five bytes per table entry.
        SHL     SI,1
        SHL     SI,1
        ADD     SI,BX
        ADD     SI,OFFSET LZWHash       ; And point into the table

        POP     CX
        POP     BX

        RET


Adapting it to 32 bit code should not be difficult.
It is just a matter of replacing 16 bit regs with their 32 bit counterpart,
or there is some trick to be aware of?

Thanks anyway
Title: Re: The shadow of a binary tree
Post by: FORTRANS on February 08, 2013, 05:49:53 AM
Hi,

Quote from: frktons on February 08, 2013, 04:36:21 AM
Adapting it to 32 bit code should not be difficult.
It is just a matter of replacing 16 bit regs with their 32 bit counterpart,
or there is some trick to be aware of?

   No tricks really.  The operations here only use 13-bits until
you add in the offset to the table.  Of course GIF files were
defined  to be easy to create, often on 8-bit computers.
When going to 32-bits you have a lot more resources available.
If your hash table is much larger, you may want to use a more
sophisticated hash algorithm as this one can take a little while
to "warm up" in a few situations.  It does not matter in this
application, as it will still work due to the successful processing
of collisions.  And in practice, it seems to work well.

   Note that the collision processing generally requires a second
hash function to look for an unused table entery.  But in my
case a really simple one works.  Brute force.  If you get a really
large hash table that is heavily used you might want to read up
on the theory in Knuth's book or similar.

   As you can see in the code comments, I thought I might need
a better hash function as this one looks simple.  But, in testing
it, it worked well.

Regards,

Steve N.
Title: Re: The shadow of a binary tree
Post by: frktons on February 08, 2013, 05:56:27 AM
Quote from: FORTRANS on February 08, 2013, 05:49:53 AM

   As you can see in the code comments, I thought I might need
a better hash function as this one looks simple.  But, in testing
it, it worked well.

Regards,

Steve N.

Steve, did you test it to see how many hashes per second it generates?
Only testing performance you can decide if it is fast enough or you could
better implement a new one.

Frank
Title: Re: The shadow of a binary tree
Post by: FORTRANS on February 08, 2013, 07:32:05 AM
Hi Frank,

   No.  I tested for evidence of good or bad behavior.  I looked
to see how the table filled up.  If it filled one part too much, it
would have too many collisions.  The entries were spread out
fairly well.  I just looked at the table using DEBUG after letting
it run for a while.  If it had looked bad, I would either generate
statistics on the collisions or just use some other hash function
suggested by Knuth or Nelson or smack this one till it looked
okay.  Hm, statistics?  Might be fun to look at.

   Generating the hash table index is so small compared
to the rest of the compression code that speed was never a
question.  The other algorithims I looked at use either a divide
or a multiply or both.  Those would have to be slower than an
add.  (At least on my older computers.)  The program is fast
enough on my computers that I use for programming.  Though
the rotate in this one might hurt a bit.  Just a quick check to
see a rotate is better than a multiply on older cpus.

Cheers,

Steve N.
Title: Re: The shadow of a binary tree
Post by: FORTRANS on February 09, 2013, 07:34:08 AM
Hi,

   Should have said shift rather than rotate above.

   Well I put some code in to see how many collisions occur.
The first test got 16088 collisions for 100318 indicies or 16%.
A second test was run to test what I hoped would be an
improvement in the algorithm.  The number of collisions was
5683  with the newer version and was 2656 with the posted
version, the number of indicies was 801773, so 0.7% and
0.3%.  The major difference is the first test was for a 256
color file and the second was 16 color.

Cheers,

Steve N.
Title: Re: The shadow of a binary tree
Post by: frktons on February 09, 2013, 10:24:59 AM
Quote from: FORTRANS on February 09, 2013, 07:34:08 AM
Hi,

   Should have said shift rather than rotate above.

   Well I put some code in to see how many collisions occur.
The first test got 16088 collisions for 100318 indicies or 16%.
A second test was run to test what I hoped would be an
improvement in the algorithm.  The number of collisions was
5683  with the newer version and was 2656 with the posted
version, the number of indicies was 801773, so 0.7% and
0.3%.  The major difference is the first test was for a 256
color file and the second was 16 color.

Cheers,

Steve N.

Not bad Steve.   :t
In my case, working on text file, I'll probably hit the same
number of collisions than the one with 256 colors, being
256 the number of ANSI codes.
Solving the "collision problem" in a fast way should be the key for
a good hash  algo. It is the most "time consuming part" for what
I can foresee.
Title: Re: The shadow of a binary tree
Post by: hutch-- on February 09, 2013, 12:08:57 PM
Frank,

With a decent hash table, the slowest part will always be the number generation, if your collision detection is done properly it does not noticably slow down until you have something like 80% table saturation which you should never reach if you allocate a table big enough to handle your data. Rough rule is 50% saturation will barely see any slow down. Now there are a few tricks to improve the distribution along the table, make the collision stepping dependent on the data length so that each entry is potentially different from the last one, this means you have a better chance of finding an empty slot if you do not keep stepping the same amount each time.

The action will always be in how you hash the source data, the more unique the result, the less collisions you get and one of the most common techniques is to multiply then return the remainder as it will be in the range of your table size. It usually involves getting a number for each character in your source data and there are a number of ways to do this, a lookup table for each character position allows you to use a table of prime numbers so that you do not get even stepping and a reduced number of results.
Title: Re: The shadow of a binary tree
Post by: FORTRANS on February 10, 2013, 03:35:26 AM
Quote from: frktons on February 09, 2013, 10:24:59 AM
Not bad Steve.

Hi Frank,

   Well 16% is a bit high.  Though livable.  Less than 1%
is great.  I think the large amount of background color in
that image may be a factor.  The 16 color image has a non-
uniform background.

Quote
In my case, working on text file, I'll probably hit the same
number of collisions than the one with 256 colors, being
256 the number of ANSI codes.

   Actually text is skewed to fewer symbols.  English or programs
has letters, numbers, punctuation, white space, and a few
symbols.  And the letters are skewed towards things like vowels
and lower case , and away from capital letters, q's, and z's
So maybe a hundred total and thirty common.  Just looked
at a large assembly program of mine, and it has about a third
comprised of just spaces, CR, and LF.  (I don't use tabs.)
It does use all 256 codes though.

Quote
Solving the "collision problem" in a fast way should be the key for
a good hash  algo. It is the most "time consuming part" for what
I can foresee.

   Two comparisons and two conditional jumps in this case to
determine if a collision occurs.  And then make a change in the
index and check again.  Almost nothing compared to the bit
mashing and record keeping to create the variable length LZW
tokens and putting them in a file.

   Hutch, good comments.  Knuth used 80% as a saturation
as that is where his programs started to be adversely affected.
And he wrote the book (at least the edition I have) when
computers were much smaller, so space often was often a
trade off with speed.

   I may do some further tests...

Regards,

Steve N.
Title: Re: The shadow of a binary tree
Post by: frktons on February 10, 2013, 09:30:16 AM
Thanks Steve I and II.  :biggrin:

I'm studying a little of this stuff before choosing what kind of algorithm
to implement. Better ho have a clear idea of what to do, otherwise I'm
not able to start coding.

I'll post my ideas as they becomes clean and organized.

Frank
Title: Re: The shadow of a binary tree
Post by: FORTRANS on February 11, 2013, 08:17:38 AM
Hi,

   Ran some tests creating 256 color GIF files. These were
four or five times larger than the one I mentioned above.
Most had 10 to 12 % collisions.  But one interesting data
point was an upside down picture had 21% collisions.
When I flipped the image over, it had only 11%.  Sheesh.

Cheers,

Steve N.
Title: Re: The shadow of a binary tree
Post by: frktons on February 11, 2013, 08:29:54 AM
Quote from: FORTRANS on February 11, 2013, 08:17:38 AM
Hi,

   Ran some tests creating 256 color GIF files. These were
four or five times larger than the one I mentioned above.
Most had 10 to 12 % collisions.  But one interesting data
point was an upside down picture had 21% collisions.
When I flipped the image over, it had only 11%.  Sheesh.

Cheers,

Steve N.

This looks a bit odd. I don't know what this could mean...
By the way, are you still using the 16 bit code, or you
updated it to 32 bit?

Frank
Title: Re: The shadow of a binary tree
Post by: FORTRANS on February 11, 2013, 09:38:12 AM
Hi Frank,

Quote from: frktons on February 11, 2013, 08:29:54 AM
This looks a bit odd. I don't know what this could mean...

   If you mean 21% to 10% collisions, it means that the LZW
algorithm combined with my hash algorithm is quite data
dependent.  Some patterns are worse than others.  I mentioned
this to see if anyone else has seen something similar.

Quote
By the way, are you still using the 16 bit code, or you
updated it to 32 bit?

   No, it is still 16-bit.  The arithmetic fits into word and byte
sized variables.  The only 32-bit item in the code is a DWORD
(really two words) to count the indicies to report the collision
information.  And the code is ugly.  If I upgraded to 32-bit
code I sould probably just start over.  The executable is about
33.5 kb, of which 25.6 kb is the data for the hash table.  And
it runs on my 80186 computer if needed.  Hm, haven't tried
that recently.

   I have written some 32-bit code.  Some as subroutines for
16-bit programs.  Some as modifications to programs that
people have posted here in the forum.

Cheers,

Steve N.
Title: Re: The shadow of a binary tree
Post by: hutch-- on February 11, 2013, 10:01:18 AM
Frank,

In the example code in MASM32 there is a hash table example for performing word replacements on text data. It loads a word list into the table then replaces each word in the source that is in the loaded table with its replacement in the loaded table. This example is one of the simpler ones that uses fixed memory and it does the basic things that a hash table needs to do.

\masm32\examples\advanced\wrep
Title: Re: The shadow of a binary tree
Post by: frktons on February 11, 2013, 10:28:26 AM
Quote from: FORTRANS on February 11, 2013, 09:38:12 AM
   If you mean 21% to 10% collisions, it means that the LZW
algorithm combined with my hash algorithm is quite data
dependent.  Some patterns are worse than others.  I mentioned
this to see if anyone else has seen something similar.

...  If I upgraded to 32-bit
code I sould probably just start over.  The executable is about
33.5 kb, of which 25.6 kb is the data for the hash table.  And
it runs on my 80186 computer if needed.  Hm, haven't tried
that recently.

Steve N.

Ok Steve, it was just a curiosity.  :biggrin:

Quote from: hutch-- on February 11, 2013, 10:01:18 AM
Frank,

In the example code in MASM32 there is a hash table example for performing word replacements on text data. It loads a word list into the table then replaces each word in the source that is in the loaded table with its replacement in the loaded table. This example is one of the simpler ones that uses fixed memory and it does the basic things that a hash table needs to do.

\masm32\examples\advanced\wrep

Thanks Steve, I'll have a look at it.  :t