Author Topic: Bug in Boyer-Moore BM.asm ?  (Read 3587 times)

Rav

  • Regular Member
  • *
  • Posts: 36
Re: Bug in Boyer-Moore BM.asm ?
« Reply #15 on: May 08, 2022, 12:22:31 AM »
In my case I'm not invoking any len-type function at all, in other words I'm not calculating the length at all, because my strings already have a length doubleword attached right to them (in a header).  So I'm just copying the length doubleword stored in my strings header to the BM programs length doubleword, like this:

    mov eax,vTXTnelm     ; Number of elements in my source text string.
    mov srcLngth,eax       ; Copy that to the variable that the BM program uses.

And yes, I am PUSHing eax before I do that and POPing it afterwards so as not to destroy it.

Or am I still missing something?


I just saw your latest post re "solved the issue with the -1 returned for the first two "len calculated" runs."  I still don't understand how the code can return a different result just because in one case you're passing in a register which contains the calculated length, and in the other you're passing in the calculated length directly.

It's the order. When passing len(pString) as an argument, as in...

invoke BMBinSearch, 0, Search$, len(Search$), sub$, len(sub$)

... one would assume the following code:
Code: [Select]
  invoke somelenfunction, addr sub$
  push eax
  push sub$
  invoke somelenfunction, addr Search$
  push Search$
  push 0
  call BMBinSearch

Compare that to the disassembly above, and you'll see that the order is wrong: eax gets trashed. Normally, the assembler would throw an error, but here it doesn't.

jj2007

  • Member
  • *****
  • Posts: 13337
  • Assembly is fun ;-)
    • MasmBasic
Re: Bug in Boyer-Moore BM.asm ?
« Reply #16 on: May 08, 2022, 12:36:16 AM »
Your method is fine, no problem with that :thumbsup:

Rav

  • Regular Member
  • *
  • Posts: 36
Re: Bug in Boyer-Moore BM.asm ?
« Reply #17 on: May 08, 2022, 01:53:39 AM »
OP here.  I guess the net of all this is, is there anyone here who can fix the sometimes incorrect results from the two Boyer-Moore string search algorithm implementations in the MASM32 SDK?  These are BMBinSearch (Boyer-Moore, in file bm.asm) and BMHBinsearch (Boyer-Moore-Horspool, in file bmh.asm).  I wish I could do it myself but I don't understand the algorithm itself.  If they don't get fixed (or can't be) I'll just use something else not quite as fast, but of course I'd prefer it if they were.

Or can anyone point me to a MASM-compatible Boyer-Moore assembly (x86-32 bit) implementation elsewhere?  I've Googled around but can't find one (except on this web site of course).  It has to run in 32-bit processor mode, but it can use SIMD instructions (if I understand correctly, those are allowed in 32-bit mode if you use the .XMM directive).

Thanks / Rav

jj2007

  • Member
  • *****
  • Posts: 13337
  • Assembly is fun ;-)
    • MasmBasic
Re: Bug in Boyer-Moore BM.asm ?
« Reply #18 on: May 08, 2022, 02:08:36 AM »
Another one, using the content of \Masm32\include\msvcrt.inc and _imp__wscanf. The Horspool variant does ok:

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Search$=          ; -----...
sub$=           _imp__wscanf

369 ms for BMBinSearch, len calculated, result=50522
695 ms for BMHBinsearch, len calculated, result=61855
158 ms for BMBinSearch, len supplied, result=50522
483 ms for BMHBinsearch, len supplied, result=61855
630 ms for MasmBasic Instr_, result=61856
765 ms for Masm32 SDK InString, result=0
247 ms for MasmBasic FAST Instr_, len supplied, result=61856

368 ms for BMBinSearch, len calculated, result=50522
689 ms for BMHBinsearch, len calculated, result=61855
159 ms for BMBinSearch, len supplied, result=50522
487 ms for BMHBinsearch, len supplied, result=61855
624 ms for MasmBasic Instr_, result=61856
743 ms for Masm32 SDK InString, result=0
255 ms for MasmBasic FAST Instr_, len supplied, result=61856

The normal BM variant has a problem:
   Let sub$="_imp__wscanf"  ; returns wrong result
   Let sub$="imp__wscanf"    ; returns good result

But the Horspool variant is consistently much slower:
Code: [Select]
460 ms for BMBinSearch, len calculated, result=61856
668 ms for BMHBinsearch, len calculated, result=61856
231 ms for BMBinSearch, len supplied, result=61856
458 ms for BMHBinsearch, len supplied, result=61856
616 ms for MasmBasic Instr_, result=61857
754 ms for Masm32 SDK InString, result=0
167 ms for MasmBasic FAST Instr_, len supplied, result=61857

Attention:
Boyer-Moore:
Quote
The functions return either the zero based offset of the pattern being searched for or -1 if the pattern does not occur within the data being searched
Normal InString:
Quote
If the function succeeds, it returns the 1 based index of the start of the substring or 0 if no match is found.

I wonder why the SDK InString doesn't work with a simple invoke InString, 1, Search$, sub$ :sad:

Rav

  • Regular Member
  • *
  • Posts: 36
Re: Bug in Boyer-Moore BM.asm ?
« Reply #19 on: May 08, 2022, 04:07:54 AM »
Jochen (jj2007) -- I downloaded and installed MasmBasic and ran RichMasm.exe, and found the documentation for Instr_(), but I can't for the life of me figure out how to extract a (hopefully) self-contained .ASM file containing the implementation of Instr_, similar to the BM*.asm files in the MASM32 SDK.  From the timings you've shown it looks like Instr_ is a fast string search algorithm, which would suit my needs.  IS it available as a separate .ASM file that developers can use in their own programs?  And does it run in 32-bit processor mode?  Thanks.


Another one, using the content of \Masm32\include\msvcrt.inc and _imp__wscanf. The Horspool variant does ok:

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Search$=          ; -----...
sub$=           _imp__wscanf

369 ms for BMBinSearch, len calculated, result=50522
695 ms for BMHBinsearch, len calculated, result=61855
158 ms for BMBinSearch, len supplied, result=50522
483 ms for BMHBinsearch, len supplied, result=61855
630 ms for MasmBasic Instr_, result=61856
765 ms for Masm32 SDK InString, result=0
247 ms for MasmBasic FAST Instr_, len supplied, result=61856

368 ms for BMBinSearch, len calculated, result=50522
689 ms for BMHBinsearch, len calculated, result=61855
159 ms for BMBinSearch, len supplied, result=50522
487 ms for BMHBinsearch, len supplied, result=61855
624 ms for MasmBasic Instr_, result=61856
743 ms for Masm32 SDK InString, result=0
255 ms for MasmBasic FAST Instr_, len supplied, result=61856

The normal BM variant has a problem:
   Let sub$="_imp__wscanf"  ; returns wrong result
   Let sub$="imp__wscanf"    ; returns good result

But the Horspool variant is consistently much slower:
Code: [Select]
460 ms for BMBinSearch, len calculated, result=61856
668 ms for BMHBinsearch, len calculated, result=61856
231 ms for BMBinSearch, len supplied, result=61856
458 ms for BMHBinsearch, len supplied, result=61856
616 ms for MasmBasic Instr_, result=61857
754 ms for Masm32 SDK InString, result=0
167 ms for MasmBasic FAST Instr_, len supplied, result=61857

Attention:
Boyer-Moore:
Quote
The functions return either the zero based offset of the pattern being searched for or -1 if the pattern does not occur within the data being searched
Normal InString:
Quote
If the function succeeds, it returns the 1 based index of the start of the substring or 0 if no match is found.

I wonder why the SDK InString doesn't work with a simple invoke InString, 1, Search$, sub$ :sad:

jj2007

  • Member
  • *****
  • Posts: 13337
  • Assembly is fun ;-)
    • MasmBasic
Re: Bug in Boyer-Moore BM.asm ?
« Reply #20 on: May 08, 2022, 04:13:56 AM »
Jochen (jj2007) -- I downloaded and installed MasmBasic and ran RichMasm.exe, and found the documentation for Instr_(), but I can't for the life of me figure out how to extract a (hopefully) self-contained .ASM file containing the implementation of Instr_, similar to the BM*.asm files in the MASM32 SDK.

You can statically link to InstrCi in \Masm32\MasmBasic\MasmBasic.lib

The source is about 400 lines of highly idiosyncratic code, and I am sorry but it's a closed source, because I don't want it to see copied & pasted all over the web.

Rav

  • Regular Member
  • *
  • Posts: 36
Re: Bug in Boyer-Moore BM.asm ?
« Reply #21 on: May 08, 2022, 04:27:05 AM »
Understood.  I'm imagining (guessing really) that it might be using XMM registers and the PCMPESTRI (or related) instruction.  At any rate, using PCMPESTRI is also something I'm going to research, if I can figure the silly thing out!  Thanks again. / Rav


Jochen (jj2007) -- I downloaded and installed MasmBasic and ran RichMasm.exe, and found the documentation for Instr_(), but I can't for the life of me figure out how to extract a (hopefully) self-contained .ASM file containing the implementation of Instr_, similar to the BM*.asm files in the MASM32 SDK.

You can statically link to InstrCi in \Masm32\MasmBasic\MasmBasic.lib

The source is about 400 lines of highly idiosyncratic code, and I am sorry but it's a closed source, because I don't want it to see copied & pasted all over the web.

jj2007

  • Member
  • *****
  • Posts: 13337
  • Assembly is fun ;-)
    • MasmBasic
Re: Bug in Boyer-Moore BM.asm ?
« Reply #22 on: May 08, 2022, 04:48:24 AM »
PCMPESTRI is SSE4.2; until now, I haven't gone beyond SSE2, in order to remain compatible to older CPUs. Check pcmpeqb if you are eager to roll your own ;-)

Vortex

  • Member
  • *****
  • Posts: 2724
Re: Bug in Boyer-Moore BM.asm ?
« Reply #23 on: May 08, 2022, 05:14:50 AM »
Hi Rav,

BMHBinsearch is reporting -1.

Rav

  • Regular Member
  • *
  • Posts: 36
Re: Bug in Boyer-Moore BM.asm ?
« Reply #24 on: May 08, 2022, 05:25:48 AM »
Hi Rav,

BMHBinsearch is reporting -1.

Thank you for checking.  Unfortunately that's another confirmation that BMHBinsearch sometimes returns the wrong result (in addition to BMBinsearch with certain other strings).

jj2007

  • Member
  • *****
  • Posts: 13337
  • Assembly is fun ;-)
    • MasmBasic
Re: Bug in Boyer-Moore BM.asm ?
« Reply #25 on: May 08, 2022, 05:46:52 AM »
Hi Erol,

Can you get Masm32 SDK InString to work with these strings?

Code: [Select]
Let Search$=FileRead$("\Masm32\include\msvcrt.inc")  ; content of this file
Let sub$="_imp__wscanf" ; "_imp__wscanf" produces wrong result with BMBinSearch
invoke InString, 1, Search$, sub$   ; returns 0

Vortex

  • Member
  • *****
  • Posts: 2724
Re: Bug in Boyer-Moore BM.asm ?
« Reply #26 on: May 08, 2022, 06:20:09 AM »
Hi Jochen,

Here is the code :

Code: [Select]
include     \masm32\include\masm32rt.inc

.data

filename    db '\Masm32\include\msvcrt.inc',0
subs        db '_imp__wscanf',0
msg         db '1 based index of the start of the substring = %u',0

.data?

pMem        dd ?
size1       dd ?

.code

start:

    invoke  read_disk_file,ADDR filename,\
            ADDR pMem,ADDR size1
           
    invoke  InString,1,pMem,ADDR subs

    invoke  crt_printf,ADDR msg,eax

    invoke  GlobalFree,pMem

    invoke  ExitProcess,0

END start

The result is correct and verified with a hex editor reporting the offset of the substring :

Code: [Select]
1 based index of the start of the substring = 61856

jj2007

  • Member
  • *****
  • Posts: 13337
  • Assembly is fun ;-)
    • MasmBasic
Re: Bug in Boyer-Moore BM.asm ?
« Reply #27 on: May 08, 2022, 06:43:55 AM »
Thanks a lot, Erol - your code works, and made me find my bug :thumbsup:

Rav

  • Regular Member
  • *
  • Posts: 36
Re: Bug in Boyer-Moore BM.asm ?
« Reply #28 on: May 08, 2022, 07:53:51 AM »
OP here.  I think I've found a particular string and substring pair which causes the BMBinSearch version of the Boyer-Moore string searching program (\MASM32\M32LIB\bm.asm) to go into an endless loop, thus hanging and never returning, effectively crashing.  At least here that's what it does.

String: '94764' (length 5)
Substring: '64' (length 2)

Jochen/Vortex/whoever, if you have a moment can you try these strings with that routine and verify that it hangs for you?  Thanks / Rav

jj2007

  • Member
  • *****
  • Posts: 13337
  • Assembly is fun ;-)
    • MasmBasic
Re: Bug in Boyer-Moore BM.asm ?
« Reply #29 on: May 08, 2022, 08:03:29 AM »
Confirmed :cool:

Code: [Select]
include \masm32\include\masm32rt.inc ; plain Masm32 for the fans of pure assembler

.code
start: cls
print "press Ctrl C to stop this", 13, 10
print "trying BMHBinsearch, 0, '947xx', 5, 'xx', 2", 13, 10
print str$(rv(BMHBinsearch, 0, "947xx", 5, "xx", 2)), " is the result for BMHBinsearch", 13, 10
print "trying BMBinSearch, 0, '947xx', 5, 'xx', 2", 13, 10
print str$(rv(BMBinSearch, 0, "947xx", 5, "xx", 2)), " is the result for BMBinSearch", 13, 10
MsgBox 0, "OK", "You won't see this box:", MB_OK
exit
end start