News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Bug in Boyer-Moore BM.asm ?

Started by Rav, May 07, 2022, 06:13:41 AM

Previous topic - Next topic

Rav

Greetings.  My first post here.  I'm an APL programmer who also likes to explore assembly language programming, and I've written a few small routines to speed up certain areas of the apps I write.  I'm running Windows 10 Pro and using Visual Studio 2022.  ML.EXE is version 14.31.31107.0.

The first thing I wanted to do after discovering the MASM32 SDK was to incorporate the Boyer-Moore algorithm for string searching into my app (yes, you can directly call assembled machine language from APL).  At any rate, after I incorporated BM.asm from C:\masm32\m32lib, I did some testing to make sure it was working.  At first, everything seemed to work correctly, getting the correct offset returned.  But then I found a source string and substring which returned the wrong offset.  When the source string is 'xx includix' and the substring is 'did', it returns 8, when it should have returned -1 (for not found).

I must point out that, being somewhat new to MASM, I couldn't figure out how to use INVOKE to pass in the arguments when testing.  So what I did was modify BM.ASM, removing the passed-in arguments where the proc statement is (startpos:DWORD, lpSource:DWORD,srcLngth:DWORD, lpSubStr:DWORD,subLngth:DWORD) and then hard-coding them with DW's in the .data segment and then loading them in the .code segment (so for example, I put    mysource DB 'xx includix'  in the .data segment and then added    mov lpSource,offset mysource   in the .code segment).  I hope this makes sense.  I hope that doing things this way hasn't in some way CAUSED the erroneous result.

Can anyone verify this bug?  And if there is indeed a bug, can it be fixed?  Thanks.  / Rav


Vortex

Hi Rav,

Welcome to the forum. Could you post here your code?

Rav

I hope I've inserted the code correctly (below).  I started with BM.ASM from MASM32\M32lib.  I've noted my modifications with ***.  Their only purpose was to allow me to test since I didn't know how to INVOKE the procedure with arguments.  In Visual Studio, I ran "Local Window Debugger," which assembled successfully and ran to completion.  It displayed the message

The thread 0x8ee0 has exited with code 8 (0x8).
The program '[19396] Test.exe' has exited with code 0 (0x0).

I verified that register eax (the result register) contained 8.  It should have contained -1, meaning not found, and the message should have been "... code 4294967295 (0xffffffff)."

If my modifications are causing the "bug" I'd appreciate knowing what I did wrong.  Thanks for your help.  / Rav




; #########################################################################

    .486
    .model flat, stdcall  ; 32 bit memory model
    option casemap :none  ; case sensitive

    .data

; *** MODS ***
    mystartpos  DD   0
    mysource    DB  'xx includix'
    mysrcLngth  DD  11
    mysubstr    DB  'did'
    mysubLngth  DD  3

    startpos DD ?
    lpSource DD ?
    srcLngth DD ?
    lpSubStr DD ?
    subLngth DD ?
; *** MODS ***


    .code

; #########################################################################

align 16


; *** MOD ***
mainCRTStartup proc

; *** Originally ***
; *** BMBinSearch proc startpos:DWORD,
; ***                  lpSource:DWORD,srcLngth:DWORD,
; ***                  lpSubStr:DWORD,subLngth:DWORD


  ; -----------------------------------------------------------------
  ; This version uses four heuristics, it determines the shift type
  ; from the character in the table, if the character is not in the
  ; pattern, it performs a BAD CHARACTER shift, if the character is
  ; in the pattern, it determines if it is the first comparison after
  ; the shift and adds the GOOD SUFFIX shift to the location counter.
  ; If the comparison is not the first after the shift, it calculates
  ; the GOOD SUFFIX shift and adds it to the location counter.
  ; -----------------------------------------------------------------

    LOCAL cval   :DWORD
    LOCAL shift_table[256]:DWORD

*** MODS ***
push eax
    mov eax,mystartpos
    mov startpos,eax
    mov lpSource,offset mysource
    mov eax,mysrcLngth
    mov srcLngth,eax
    mov lpSubStr,offset mysubstr
    mov eax,mysubLngth
    mov subLngth,eax
pop eax
*** MODS ***

    push ebx
    push esi
    push edi

    mov ebx, subLngth

    cmp ebx, 1
    jg @F
    mov eax, -2                 ; string too short, must be > 1
    jmp Cleanup
  @@:

    mov esi, lpSource
    add esi, srcLngth
    sub esi, ebx
    mov edx, esi            ; set Exit Length

  ; ----------------------------------------
  ; load shift table with value in subLngth
  ; ----------------------------------------
    mov ecx, 256
    mov eax, ebx
    lea edi, shift_table
    rep stosd

  ; ----------------------------------------------
  ; load decending count values into shift table
  ; ----------------------------------------------
    mov ecx, ebx                ; SubString length in ECX
    dec ecx                     ; correct for zero based index
    mov esi, lpSubStr           ; address of SubString in ESI
    lea edi, shift_table

    xor eax, eax

  Write_Shift_Chars:
    mov al, [esi]               ; get the character
    inc esi
    mov [edi+eax*4], ecx        ; write shift for each character
    dec ecx                     ; to ascii location in table
    jnz Write_Shift_Chars

  ; -----------------------------
  ; set up for main compare loop
  ; -----------------------------
    mov ecx, ebx
    dec ecx
    mov cval, ecx

    mov esi, lpSource
    mov edi, lpSubStr
    add esi, startpos           ; add starting position

    jmp Pre_Loop

; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  Calc_Suffix_Shift:
    add eax, ecx
    sub eax, cval               ; sub loop count
    jns Add_Suffix_Shift
    mov eax, 1                  ; minimum shift is 1

  Add_Suffix_Shift:
    add esi, eax                ; add SUFFIX shift
    mov ecx, cval               ; reset counter in compare loop

  Test_Length:
    cmp edx, esi                ; test exit condition
    jl No_Match

  Pre_Loop:
    xor eax, eax                ; zero EAX for following partial writes
    mov al, [esi+ecx]
    cmp al, [edi+ecx]           ; cmp characters in ESI / EDI
    je @F
    mov eax, shift_table[eax*4]
    cmp ebx, eax
    jne Add_Suffix_Shift        ; bypass SUFFIX calculations
    lea esi, [esi+ecx+1]        ; add BAD CHAR shift
    jmp Test_Length
  @@:
    dec ecx
    xor eax, eax                ; zero EAX for following partial writes

  Cmp_Loop:
    mov al, [esi+ecx]
    cmp al, [edi+ecx]           ; cmp characters in ESI / EDI
    jne Set_Shift               ; if not equal, get next shift
    dec ecx
    jns Cmp_Loop
    jmp Match                   ; fall through on match

  Set_Shift:
    mov eax, shift_table[eax*4]
    cmp ebx, eax
    jne Calc_Suffix_Shift       ; run SUFFIX calculations
    lea esi, [esi+ecx+1]        ; add BAD CHAR shift
    jmp Test_Length

; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

  Match:
    sub esi, lpSource           ; sub source from ESI
    mov eax, esi                ; put length in eax
    jmp Cleanup

  No_Match:
    mov eax, -1

  Cleanup:
    pop edi
    pop esi
    pop ebx

    ret

mainCRTStartup endp ; *** MOD *** Originally: BMBinSearch endp

; #########################################################################

    end


hutch--

Hi Rav,

When I wrote those a long time ago, I tested them over many millions of examples (automated) and checked them against a normal byte style search algorithm (also automated) to see if the two techniques produced the same results. A byte scanner search algo will always produce the right result but generally slower than a BM algo.

Give this algo in the library a try.

InString proc StartPos:DWORD, lpszString:DWORD, lpszSubStr:DWORD

This will tell you if you are calling the algos correctly as this is a classic byte scanner.

This is another that sets the search string length.

BinSearch proc StartPos:DWORD,lpString:DWORD,lnStrng:DWORD,
               lpSubStr:DWORD,lnSubSt:DWORD




jj2007

Hi Rav,

There is a related thread here. Welcome to the Forum :thup:

Jochen

Rav

Thanks to you all for your responses so far.  I verified that I was calling the ASM routines correctly.  I was hopeful when I looked at the thread that Jochen referenced, since it sounded like the Horspool variation of the Boyer-Moore algorithm (BMHBinsearch, in bmh.asm) was working where the "regular" Boyer-Moore algorithm (BMBinSearch, in bm.asm) wasn't.  But when I integrated the BMH version and started running verification tests on it, after only about 67,000 tests with differing random-sized string and substrings, it failed on the following:

string: 'class="sc-htoDjs cnueJ'
substring: 'lass="sc-htoDjs cnueJ'

The substring is the same as the string except it omits the first character.
It should be returning 1 (found in the second position), but is returning -1 (not found).

Can any of you execute the BMHBinsearch version using those exact strings, and verify that you also get the same incorrect result?

jj2007

Confirmed:

include \masm32\MasmBasic\MasmBasic.inc         ; download
SetGlobals Search$='class="sc-htoDjs cnueJ', sub$='lass="sc-htoDjs cnueJ'
  Init
  invoke BMBinSearch,0, Search$, Len(Search$), sub$, Len(sub$)
  deb 4, "A", eax
  invoke BMHBinsearch,0, Search$, Len(Search$), sub$, Len(sub$)
  deb 4, "B", eax
  deb 4, "C", Instr_(Search$, sub$)
EndOfCode


A       eax             -1
B       eax             -1
C       edx             2


I found something even weirder when using your strings:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
753 ms for BMBinSearch, len calculated, result=-1
730 ms for BMHBinsearch, len calculated, result=-1
816 ms for BMBinSearch, result=1   <<<<<<<<<<<<<<<
621 ms for BMHBinsearch, result=-1
321 ms for Instr_, result=2


Source & exe attached (*.asc opens in WordPad, in case you don't have MasmBasic).

Vortex

I receive the correct results with the following code :

include     \masm32\include\masm32rt.inc

.data

testdata    db 1,2,3,4,5,6,0,2,4,7,8,0
search      db 5,6,0,2
string1     db 'Address of the string = %X, Zero based offset of the pattern = %d',13,10,0

testdata2   db 'class=',34,'sc-htoDjs cnueJ',0
len1        equ $ - testdata2 - 1

search2     db 'lass=',34,'sc-htoDjs cnueJ',0
len2        equ $ - search2 - 1

.code

start:

    invoke  BMBinSearch,0,ADDR testdata,12,ADDR search,4

    invoke  crt_printf,ADDR string1,ADDR testdata,eax

    invoke  BMBinSearch,0,ADDR testdata2,len1,ADDR search2,len2

    invoke  crt_printf,ADDR string1,ADDR testdata2,eax

    invoke  ExitProcess,0

END start


Address of the string = 402000, Zero based offset of the pattern = 4
Address of the string = 402054, Zero based offset of the pattern = 1

Rav

Thanks, Vortex, but are you sure you invoked the correct BM variant?  If I understand correctly what you ran, it looks like you invoked the Boyer-Moore variant (BMBinSearch, which was the first bug I reported, with a different set of strings).  The one you should be invoking is the Boyer-Moore-Horspool variant (BMHBinsearch ).  / Rav

Quote from: Vortex on May 07, 2022, 07:25:24 PM
I receive the correct results with the following code :

include     \masm32\include\masm32rt.inc

.data

testdata    db 1,2,3,4,5,6,0,2,4,7,8,0
search      db 5,6,0,2
string1     db 'Address of the string = %X, Zero based offset of the pattern = %d',13,10,0

testdata2   db 'class=',34,'sc-htoDjs cnueJ',0
len1        equ $ - testdata2 - 1

search2     db 'lass=',34,'sc-htoDjs cnueJ',0
len2        equ $ - search2 - 1

.code

start:

    invoke  BMBinSearch,0,ADDR testdata,12,ADDR search,4

    invoke  crt_printf,ADDR string1,ADDR testdata,eax

    invoke  BMBinSearch,0,ADDR testdata2,len1,ADDR search2,len2

    invoke  crt_printf,ADDR string1,ADDR testdata2,eax

    invoke  ExitProcess,0

END start


Address of the string = 402000, Zero based offset of the pattern = 4
Address of the string = 402054, Zero based offset of the pattern = 1


Rav

Thanks for confirming that, Jochen.  Is the Instr_ you referenced the same as the InString procedure (InString.asm) that Hutch referenced?  If so, it looks like your test shows that that more basic string searching algorithm was actually faster in this particular case?

Are you also able to confirm the original bug I reported in my first post with the BMBinSearch variant (Boyer-Moore, not Boyer-Moore-Horspool), with the original set of strings (string 'xx includix' and substring 'did')?


Quote from: jj2007 on May 07, 2022, 05:00:34 PM
Confirmed:

include \masm32\MasmBasic\MasmBasic.inc         ; download
SetGlobals Search$='class="sc-htoDjs cnueJ', sub$='lass="sc-htoDjs cnueJ'
  Init
  invoke BMBinSearch,0, Search$, Len(Search$), sub$, Len(sub$)
  deb 4, "A", eax
  invoke BMHBinsearch,0, Search$, Len(Search$), sub$, Len(sub$)
  deb 4, "B", eax
  deb 4, "C", Instr_(Search$, sub$)
EndOfCode


A       eax             -1
B       eax             -1
C       edx             2


I found something even weirder when using your strings:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
753 ms for BMBinSearch, len calculated, result=-1
730 ms for BMHBinsearch, len calculated, result=-1
816 ms for BMBinSearch, result=1   <<<<<<<<<<<<<<<
621 ms for BMHBinsearch, result=-1
321 ms for Instr_, result=2


Source & exe attached (*.asc opens in WordPad, in case you don't have MasmBasic).

jj2007

Quote from: Rav on May 07, 2022, 09:28:44 PM
Thanks for confirming that, Jochen.  Is the Instr_ you referenced the same as the InString procedure (InString.asm)

They are different: InString is the routine which comes with the Installation of the Masm32 SDK; Instr_() (with the trailing understroke) is a MasmBasic command.

Attached a test with a long main string, i.e. the content of the Windows.inc include file, and "Duplicate" as sub$. There are several differences in the results that I can't explain. Regarding the BM variants, it seems they work better when the length of the main string is supplied as a variable or constant:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Search$=        ;;;; head...
sub$=           Duplicate

49 ms for BMBinSearch, len calculated, result=-1
49 ms for BMHBinsearch, len calculated, result=-1
253 ms for BMBinSearch, len supplied, result=977320
330 ms for BMHBinsearch, len supplied, result=977320
723 ms for MasmBasic Instr_, result=977321
1017 ms for Masm32 SDK InString, result=0
225 ms for MasmBasic FAST Instr_, len supplied, result=977321

48 ms for BMBinSearch, len calculated, result=-1
52 ms for BMHBinsearch, len calculated, result=-1
253 ms for BMBinSearch, len supplied, result=977320
326 ms for BMHBinsearch, len supplied, result=977320
725 ms for MasmBasic Instr_, result=977321
1018 ms for Masm32 SDK InString, result=0
227 ms for MasmBasic FAST Instr_, len supplied, result=977321

48 ms for BMBinSearch, len calculated, result=-1
50 ms for BMHBinsearch, len calculated, result=-1
246 ms for BMBinSearch, len supplied, result=977320
324 ms for BMHBinsearch, len supplied, result=977320
727 ms for MasmBasic Instr_, result=977321
1018 ms for Masm32 SDK InString, result=0
222 ms for MasmBasic FAST Instr_, len supplied, result=977321


977321 is the correct value. The FAST Instr_() variant also needs the length of the main string, which is often easy to get for long strings: you typically read them from a file...

hutch--

The design of a Boyer Moore algo is the longer the string, the fast it gets as a mismatch allows it to jump further than a classic byte scanner. There are combinations that work against a BM algo and in these cases a well written byte scanner can be faster. If you thing of where in a search string the mismatch occurs, the further along the search string that it occurs, the slower it gets. Not a common situation but it arises often enough, especially with short test code.

jj2007

New results - I solved the issue*) with the -1 returned for the first two "len calculated" runs:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Search$=        ;;;; head...
sub$=           Duplicate

587 ms for BMBinSearch, len calculated, result=977320
653 ms for BMHBinsearch, len calculated, result=977320
248 ms for BMBinSearch, len supplied, result=977320
328 ms for BMHBinsearch, len supplied, result=977320
732 ms for MasmBasic Instr_, result=977321
1028 ms for Masm32 SDK InString, result=0
233 ms for MasmBasic FAST Instr_, len supplied, result=977321

575 ms for BMBinSearch, len calculated, result=977320
659 ms for BMHBinsearch, len calculated, result=977320
249 ms for BMBinSearch, len supplied, result=977320
329 ms for BMHBinsearch, len supplied, result=977320
729 ms for MasmBasic Instr_, result=977321
1026 ms for Masm32 SDK InString, result=0
233 ms for MasmBasic FAST Instr_, len supplied, result=977321


*):

Quoteint 3
      mov ecx, len(sub$)
      invoke BMBinSearch, 0, Search$, len(Search$), sub$, ecx         ; works fine
      nop
      int 3
      invoke BMBinSearch, 0, Search$, len(Search$), sub$, len(sub$)   ; returns -1
      nop

  int3
  push dword ptr [ebx-7C]                  ; /Arg1 = ASCII "Duplicate"
  call szLen                               ; \BoyerMooreVsInstr_.szLen
  mov ecx, eax
  push dword ptr [ebx-80]                  ; /Arg1
  call szLen                               ; \BoyerMooreVsInstr_.szLen
  push ecx                                 ; /Arg5
  push dword ptr [ebx-7C]                  ; |Arg4
  push eax                                 ; |Arg3
  push dword ptr [ebx-80]                  ; |Arg2
  push 0                                   ; |Arg1 = 0
  call BMBinSearch                         ; \BoyerMooreVsInstr_.BMBinSearch
  nop
  int3
  push dword ptr [ebx-80]                  ; /Arg1
  call szLen                               ; \BoyerMooreVsInstr_.szLen
  push dword ptr [ebx-7C]                  ; /Arg1
  call szLen                               ; \BoyerMooreVsInstr_.szLen
  push eax                                 ; /Arg5
  push dword ptr [ebx-7C]                  ; |Arg4
  push eax                                 ; |Arg3
  push dword ptr [ebx-80]                  ; |Arg2
  push 0                                   ; |Arg1 = 0
  call BMBinSearch                         ; \BoyerMooreVsInstr_.BMBinSearch
  nop


I had naively assumed that the assembler would complain about a trashed eax :rolleyes:

Rav

I am supplying the length of both strings as variables in all my test cases, and unfortunately that isn't making any difference (still returns wrong result for certain strings).  It's odd to me that the only difference in what you're doing is either first storing the length of the string (i.e. Len(Search$)) in a variable and then passing in that variable, or you're passing Len(Search$) directly in.  I can't understand how the program could compute a different result for that.

I'll look into Instr_().  I did a timing test with InString() and while it's not as fast as either of the BM variants (which were about 3 times faster than my original string searching code that I had wanted to speed up), it's still about 2 times faster.  I'll take 2 times faster since the results seem to be more reliable (though I haven't done extensive repetitive testing yet).

PS:  I just saw your latest post re "solved the issue with the -1 returned for the first two "len calculated" runs."  I still don't understand how the code can return a different result just because in one case you're passing in a register which contains the calculated length, and in the other you're passing in the calculated length directly.

I appreciate your (and everyone elses) help with this.  / Rav


Quote from: jj2007 on May 07, 2022, 11:02:02 PM
Quote from: Rav on May 07, 2022, 09:28:44 PM
Thanks for confirming that, Jochen.  Is the Instr_ you referenced the same as the InString procedure (InString.asm)

They are different: InString is the routine which comes with the Installation of the Masm32 SDK; Instr_() (with the trailing understroke) is a MasmBasic command.

Attached a test with a long main string, i.e. the content of the Windows.inc include file, and "Duplicate" as sub$. There are several differences in the results that I can't explain. Regarding the BM variants, it seems they work better when the length of the main string is supplied as a variable or constant:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Search$=        ;;;; head...
sub$=           Duplicate

49 ms for BMBinSearch, len calculated, result=-1
49 ms for BMHBinsearch, len calculated, result=-1
253 ms for BMBinSearch, len supplied, result=977320
330 ms for BMHBinsearch, len supplied, result=977320
723 ms for MasmBasic Instr_, result=977321
1017 ms for Masm32 SDK InString, result=0
225 ms for MasmBasic FAST Instr_, len supplied, result=977321

48 ms for BMBinSearch, len calculated, result=-1
52 ms for BMHBinsearch, len calculated, result=-1
253 ms for BMBinSearch, len supplied, result=977320
326 ms for BMHBinsearch, len supplied, result=977320
725 ms for MasmBasic Instr_, result=977321
1018 ms for Masm32 SDK InString, result=0
227 ms for MasmBasic FAST Instr_, len supplied, result=977321

48 ms for BMBinSearch, len calculated, result=-1
50 ms for BMHBinsearch, len calculated, result=-1
246 ms for BMBinSearch, len supplied, result=977320
324 ms for BMHBinsearch, len supplied, result=977320
727 ms for MasmBasic Instr_, result=977321
1018 ms for Masm32 SDK InString, result=0
222 ms for MasmBasic FAST Instr_, len supplied, result=977321


977321 is the correct value. The FAST Instr_() variant also needs the length of the main string, which is often easy to get for long strings: you typically read them from a file...

jj2007

Quote from: Rav on May 08, 2022, 12:03:48 AMI just saw your latest post re "solved the issue with the -1 returned for the first two "len calculated" runs."  I still don't understand how the code can return a different result just because in one case you're passing in a register which contains the calculated length, and in the other you're passing in the calculated length directly.

It's the order. When passing len(pString) as an argument, as in...

invoke BMBinSearch, 0, Search$, len(Search$), sub$, len(sub$)

... one would assume the following code:
  invoke szLen, addr sub$
  push eax
  push sub$
  invoke szLen, addr Search$
  push Search$
  push 0
  call BMBinSearch


Compare that to the disassembly above, and you'll see that the order is wrong: eax gets trashed. Normally, the assembler would throw an error, but here it doesn't:
  invoke szLen, addr sub$
  invoke szLen, addr Search$
  push eax
  push sub$
  push eax
  push Search$
  push 0
  call BMBinSearch