Greetings. My first post here. I'm an APL programmer who also likes to explore assembly language programming, and I've written a few small routines to speed up certain areas of the apps I write. I'm running Windows 10 Pro and using Visual Studio 2022. ML.EXE is version 14.31.31107.0.
The first thing I wanted to do after discovering the MASM32 SDK was to incorporate the Boyer-Moore algorithm for string searching into my app (yes, you can directly call assembled machine language from APL). At any rate, after I incorporated BM.asm from C:\masm32\m32lib, I did some testing to make sure it was working. At first, everything seemed to work correctly, getting the correct offset returned. But then I found a source string and substring which returned the wrong offset. When the source string is 'xx includix' and the substring is 'did', it returns 8, when it should have returned -1 (for not found).
I must point out that, being somewhat new to MASM, I couldn't figure out how to use INVOKE to pass in the arguments when testing. So what I did was modify BM.ASM, removing the passed-in arguments where the proc statement is (startpos:DWORD, lpSource:DWORD,srcLngth:DWORD, lpSubStr:DWORD,subLngth:DWORD) and then hard-coding them with DW's in the .data segment and then loading them in the .code segment (so for example, I put mysource DB 'xx includix' in the .data segment and then added mov lpSource,offset mysource in the .code segment). I hope this makes sense. I hope that doing things this way hasn't in some way CAUSED the erroneous result.
Can anyone verify this bug? And if there is indeed a bug, can it be fixed? Thanks. / Rav
Hi Rav,
Welcome to the forum. Could you post here your code?
I hope I've inserted the code correctly (below). I started with BM.ASM from MASM32\M32lib. I've noted my modifications with ***. Their only purpose was to allow me to test since I didn't know how to INVOKE the procedure with arguments. In Visual Studio, I ran "Local Window Debugger," which assembled successfully and ran to completion. It displayed the message
The thread 0x8ee0 has exited with code 8 (0x8).
The program '[19396] Test.exe' has exited with code 0 (0x0).
I verified that register eax (the result register) contained 8. It should have contained -1, meaning not found, and the message should have been "... code 4294967295 (0xffffffff)."
If my modifications are causing the "bug" I'd appreciate knowing what I did wrong. Thanks for your help. / Rav
; #########################################################################
.486
.model flat, stdcall ; 32 bit memory model
option casemap :none ; case sensitive
.data
; *** MODS ***
mystartpos DD 0
mysource DB 'xx includix'
mysrcLngth DD 11
mysubstr DB 'did'
mysubLngth DD 3
startpos DD ?
lpSource DD ?
srcLngth DD ?
lpSubStr DD ?
subLngth DD ?
; *** MODS ***
.code
; #########################################################################
align 16
; *** MOD ***
mainCRTStartup proc
; *** Originally ***
; *** BMBinSearch proc startpos:DWORD,
; *** lpSource:DWORD,srcLngth:DWORD,
; *** lpSubStr:DWORD,subLngth:DWORD
; -----------------------------------------------------------------
; This version uses four heuristics, it determines the shift type
; from the character in the table, if the character is not in the
; pattern, it performs a BAD CHARACTER shift, if the character is
; in the pattern, it determines if it is the first comparison after
; the shift and adds the GOOD SUFFIX shift to the location counter.
; If the comparison is not the first after the shift, it calculates
; the GOOD SUFFIX shift and adds it to the location counter.
; -----------------------------------------------------------------
LOCAL cval :DWORD
LOCAL shift_table[256]:DWORD
*** MODS ***
push eax
mov eax,mystartpos
mov startpos,eax
mov lpSource,offset mysource
mov eax,mysrcLngth
mov srcLngth,eax
mov lpSubStr,offset mysubstr
mov eax,mysubLngth
mov subLngth,eax
pop eax
*** MODS ***
push ebx
push esi
push edi
mov ebx, subLngth
cmp ebx, 1
jg @F
mov eax, -2 ; string too short, must be > 1
jmp Cleanup
@@:
mov esi, lpSource
add esi, srcLngth
sub esi, ebx
mov edx, esi ; set Exit Length
; ----------------------------------------
; load shift table with value in subLngth
; ----------------------------------------
mov ecx, 256
mov eax, ebx
lea edi, shift_table
rep stosd
; ----------------------------------------------
; load decending count values into shift table
; ----------------------------------------------
mov ecx, ebx ; SubString length in ECX
dec ecx ; correct for zero based index
mov esi, lpSubStr ; address of SubString in ESI
lea edi, shift_table
xor eax, eax
Write_Shift_Chars:
mov al, [esi] ; get the character
inc esi
mov [edi+eax*4], ecx ; write shift for each character
dec ecx ; to ascii location in table
jnz Write_Shift_Chars
; -----------------------------
; set up for main compare loop
; -----------------------------
mov ecx, ebx
dec ecx
mov cval, ecx
mov esi, lpSource
mov edi, lpSubStr
add esi, startpos ; add starting position
jmp Pre_Loop
; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Calc_Suffix_Shift:
add eax, ecx
sub eax, cval ; sub loop count
jns Add_Suffix_Shift
mov eax, 1 ; minimum shift is 1
Add_Suffix_Shift:
add esi, eax ; add SUFFIX shift
mov ecx, cval ; reset counter in compare loop
Test_Length:
cmp edx, esi ; test exit condition
jl No_Match
Pre_Loop:
xor eax, eax ; zero EAX for following partial writes
mov al, [esi+ecx]
cmp al, [edi+ecx] ; cmp characters in ESI / EDI
je @F
mov eax, shift_table[eax*4]
cmp ebx, eax
jne Add_Suffix_Shift ; bypass SUFFIX calculations
lea esi, [esi+ecx+1] ; add BAD CHAR shift
jmp Test_Length
@@:
dec ecx
xor eax, eax ; zero EAX for following partial writes
Cmp_Loop:
mov al, [esi+ecx]
cmp al, [edi+ecx] ; cmp characters in ESI / EDI
jne Set_Shift ; if not equal, get next shift
dec ecx
jns Cmp_Loop
jmp Match ; fall through on match
Set_Shift:
mov eax, shift_table[eax*4]
cmp ebx, eax
jne Calc_Suffix_Shift ; run SUFFIX calculations
lea esi, [esi+ecx+1] ; add BAD CHAR shift
jmp Test_Length
; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Match:
sub esi, lpSource ; sub source from ESI
mov eax, esi ; put length in eax
jmp Cleanup
No_Match:
mov eax, -1
Cleanup:
pop edi
pop esi
pop ebx
ret
mainCRTStartup endp ; *** MOD *** Originally: BMBinSearch endp
; #########################################################################
end
Hi Rav,
When I wrote those a long time ago, I tested them over many millions of examples (automated) and checked them against a normal byte style search algorithm (also automated) to see if the two techniques produced the same results. A byte scanner search algo will always produce the right result but generally slower than a BM algo.
Give this algo in the library a try.
InString proc StartPos:DWORD, lpszString:DWORD, lpszSubStr:DWORD
This will tell you if you are calling the algos correctly as this is a classic byte scanner.
This is another that sets the search string length.
BinSearch proc StartPos:DWORD,lpString:DWORD,lnStrng:DWORD,
lpSubStr:DWORD,lnSubSt:DWORD
Hi Rav,
There is a related thread here (http://masm32.com/board/index.php?topic=2998.0). Welcome to the Forum :thup:
Jochen
Thanks to you all for your responses so far. I verified that I was calling the ASM routines correctly. I was hopeful when I looked at the thread that Jochen referenced, since it sounded like the Horspool variation of the Boyer-Moore algorithm (BMHBinsearch, in bmh.asm) was working where the "regular" Boyer-Moore algorithm (BMBinSearch, in bm.asm) wasn't. But when I integrated the BMH version and started running verification tests on it, after only about 67,000 tests with differing random-sized string and substrings, it failed on the following:
string: 'class="sc-htoDjs cnueJ'
substring: 'lass="sc-htoDjs cnueJ'
The substring is the same as the string except it omits the first character.
It should be returning 1 (found in the second position), but is returning -1 (not found).
Can any of you execute the BMHBinsearch version using those exact strings, and verify that you also get the same incorrect result?
Confirmed:
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
SetGlobals Search$='class="sc-htoDjs cnueJ', sub$='lass="sc-htoDjs cnueJ'
Init
invoke BMBinSearch,0, Search$, Len(Search$), sub$, Len(sub$)
deb 4, "A", eax
invoke BMHBinsearch,0, Search$, Len(Search$), sub$, Len(sub$)
deb 4, "B", eax
deb 4, "C", Instr_(Search$, sub$)
EndOfCode
A eax -1
B eax -1
C edx 2
I found something even weirder when using your strings:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
753 ms for BMBinSearch, len calculated, result=-1
730 ms for BMHBinsearch, len calculated, result=-1
816 ms for BMBinSearch, result=1 <<<<<<<<<<<<<<<
621 ms for BMHBinsearch, result=-1
321 ms for Instr_, result=2
Source & exe attached (*.asc opens in WordPad, in case you don't have MasmBasic).
I receive the correct results with the following code :
include \masm32\include\masm32rt.inc
.data
testdata db 1,2,3,4,5,6,0,2,4,7,8,0
search db 5,6,0,2
string1 db 'Address of the string = %X, Zero based offset of the pattern = %d',13,10,0
testdata2 db 'class=',34,'sc-htoDjs cnueJ',0
len1 equ $ - testdata2 - 1
search2 db 'lass=',34,'sc-htoDjs cnueJ',0
len2 equ $ - search2 - 1
.code
start:
invoke BMBinSearch,0,ADDR testdata,12,ADDR search,4
invoke crt_printf,ADDR string1,ADDR testdata,eax
invoke BMBinSearch,0,ADDR testdata2,len1,ADDR search2,len2
invoke crt_printf,ADDR string1,ADDR testdata2,eax
invoke ExitProcess,0
END start
Address of the string = 402000, Zero based offset of the pattern = 4
Address of the string = 402054, Zero based offset of the pattern = 1
Thanks, Vortex, but are you sure you invoked the correct BM variant? If I understand correctly what you ran, it looks like you invoked the Boyer-Moore variant (BMBinSearch, which was the first bug I reported, with a different set of strings). The one you should be invoking is the Boyer-Moore-Horspool variant (BMHBinsearch ). / Rav
Quote from: Vortex on May 07, 2022, 07:25:24 PM
I receive the correct results with the following code :
include \masm32\include\masm32rt.inc
.data
testdata db 1,2,3,4,5,6,0,2,4,7,8,0
search db 5,6,0,2
string1 db 'Address of the string = %X, Zero based offset of the pattern = %d',13,10,0
testdata2 db 'class=',34,'sc-htoDjs cnueJ',0
len1 equ $ - testdata2 - 1
search2 db 'lass=',34,'sc-htoDjs cnueJ',0
len2 equ $ - search2 - 1
.code
start:
invoke BMBinSearch,0,ADDR testdata,12,ADDR search,4
invoke crt_printf,ADDR string1,ADDR testdata,eax
invoke BMBinSearch,0,ADDR testdata2,len1,ADDR search2,len2
invoke crt_printf,ADDR string1,ADDR testdata2,eax
invoke ExitProcess,0
END start
Address of the string = 402000, Zero based offset of the pattern = 4
Address of the string = 402054, Zero based offset of the pattern = 1
Thanks for confirming that, Jochen. Is the Instr_ you referenced the same as the InString procedure (InString.asm) that Hutch referenced? If so, it looks like your test shows that that more basic string searching algorithm was actually faster in this particular case?
Are you also able to confirm the original bug I reported in my first post with the BMBinSearch variant (Boyer-Moore, not Boyer-Moore-Horspool), with the original set of strings (string 'xx includix' and substring 'did')?
Quote from: jj2007 on May 07, 2022, 05:00:34 PM
Confirmed:
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
SetGlobals Search$='class="sc-htoDjs cnueJ', sub$='lass="sc-htoDjs cnueJ'
Init
invoke BMBinSearch,0, Search$, Len(Search$), sub$, Len(sub$)
deb 4, "A", eax
invoke BMHBinsearch,0, Search$, Len(Search$), sub$, Len(sub$)
deb 4, "B", eax
deb 4, "C", Instr_(Search$, sub$)
EndOfCode
A eax -1
B eax -1
C edx 2
I found something even weirder when using your strings:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
753 ms for BMBinSearch, len calculated, result=-1
730 ms for BMHBinsearch, len calculated, result=-1
816 ms for BMBinSearch, result=1 <<<<<<<<<<<<<<<
621 ms for BMHBinsearch, result=-1
321 ms for Instr_, result=2
Source & exe attached (*.asc opens in WordPad, in case you don't have MasmBasic).
Quote from: Rav on May 07, 2022, 09:28:44 PM
Thanks for confirming that, Jochen. Is the Instr_ you referenced the same as the InString procedure (InString.asm)
They are different: InString is the routine which comes with the Installation of the Masm32 SDK; Instr_() (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1153) (with the trailing understroke) is a MasmBasic command.
Attached a test with a long main string, i.e. the content of the Windows.inc include file, and "Duplicate" as sub$. There are several differences in the results that I can't explain. Regarding the BM variants, it seems they work better when the length of the main string is supplied as a variable or constant:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Search$= ;;;; head...
sub$= Duplicate
49 ms for BMBinSearch, len calculated, result=-1
49 ms for BMHBinsearch, len calculated, result=-1
253 ms for BMBinSearch, len supplied, result=977320
330 ms for BMHBinsearch, len supplied, result=977320
723 ms for MasmBasic Instr_, result=977321
1017 ms for Masm32 SDK InString, result=0
225 ms for MasmBasic FAST Instr_, len supplied, result=977321
48 ms for BMBinSearch, len calculated, result=-1
52 ms for BMHBinsearch, len calculated, result=-1
253 ms for BMBinSearch, len supplied, result=977320
326 ms for BMHBinsearch, len supplied, result=977320
725 ms for MasmBasic Instr_, result=977321
1018 ms for Masm32 SDK InString, result=0
227 ms for MasmBasic FAST Instr_, len supplied, result=977321
48 ms for BMBinSearch, len calculated, result=-1
50 ms for BMHBinsearch, len calculated, result=-1
246 ms for BMBinSearch, len supplied, result=977320
324 ms for BMHBinsearch, len supplied, result=977320
727 ms for MasmBasic Instr_, result=977321
1018 ms for Masm32 SDK InString, result=0
222 ms for MasmBasic FAST Instr_, len supplied, result=977321
977321 is the correct value. The FAST Instr_() variant also needs the length of the main string, which is often easy to get for long strings: you typically read them from a file...
The design of a Boyer Moore algo is the longer the string, the fast it gets as a mismatch allows it to jump further than a classic byte scanner. There are combinations that work against a BM algo and in these cases a well written byte scanner can be faster. If you thing of where in a search string the mismatch occurs, the further along the search string that it occurs, the slower it gets. Not a common situation but it arises often enough, especially with short test code.
New results - I solved the issue
*) with the -1 returned for the first two "len calculated" runs:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Search$= ;;;; head...
sub$= Duplicate
587 ms for BMBinSearch, len calculated, result=977320
653 ms for BMHBinsearch, len calculated, result=977320
248 ms for BMBinSearch, len supplied, result=977320
328 ms for BMHBinsearch, len supplied, result=977320
732 ms for MasmBasic Instr_, result=977321
1028 ms for Masm32 SDK InString, result=0
233 ms for MasmBasic FAST Instr_, len supplied, result=977321
575 ms for BMBinSearch, len calculated, result=977320
659 ms for BMHBinsearch, len calculated, result=977320
249 ms for BMBinSearch, len supplied, result=977320
329 ms for BMHBinsearch, len supplied, result=977320
729 ms for MasmBasic Instr_, result=977321
1026 ms for Masm32 SDK InString, result=0
233 ms for MasmBasic FAST Instr_, len supplied, result=977321
*):
Quoteint 3
mov ecx, len(sub$)
invoke BMBinSearch, 0, Search$, len(Search$), sub$, ecx ; works fine
nop
int 3
invoke BMBinSearch, 0, Search$, len(Search$), sub$, len(sub$) ; returns -1
nop
int3
push dword ptr [ebx-7C] ; /Arg1 = ASCII "Duplicate"
call szLen ; \BoyerMooreVsInstr_.szLen
mov ecx, eax
push dword ptr [ebx-80] ; /Arg1
call szLen ; \BoyerMooreVsInstr_.szLen
push ecx ; /Arg5
push dword ptr [ebx-7C] ; |Arg4
push eax ; |Arg3
push dword ptr [ebx-80] ; |Arg2
push 0 ; |Arg1 = 0
call BMBinSearch ; \BoyerMooreVsInstr_.BMBinSearch
nop
int3
push dword ptr [ebx-80] ; /Arg1
call szLen ; \BoyerMooreVsInstr_.szLen
push dword ptr [ebx-7C] ; /Arg1
call szLen ; \BoyerMooreVsInstr_.szLen
push eax ; /Arg5
push dword ptr [ebx-7C] ; |Arg4
push eax ; |Arg3
push dword ptr [ebx-80] ; |Arg2
push 0 ; |Arg1 = 0
call BMBinSearch ; \BoyerMooreVsInstr_.BMBinSearch
nop
I had naively assumed that the assembler would complain about a trashed eax :rolleyes:
I am supplying the length of both strings as variables in all my test cases, and unfortunately that isn't making any difference (still returns wrong result for certain strings). It's odd to me that the only difference in what you're doing is either first storing the length of the string (i.e. Len(Search$)) in a variable and then passing in that variable, or you're passing Len(Search$) directly in. I can't understand how the program could compute a different result for that.
I'll look into Instr_(). I did a timing test with InString() and while it's not as fast as either of the BM variants (which were about 3 times faster than my original string searching code that I had wanted to speed up), it's still about 2 times faster. I'll take 2 times faster since the results seem to be more reliable (though I haven't done extensive repetitive testing yet).
PS: I just saw your latest post re "solved the issue with the -1 returned for the first two "len calculated" runs." I still don't understand how the code can return a different result just because in one case you're passing in a register which contains the calculated length, and in the other you're passing in the calculated length directly.
I appreciate your (and everyone elses) help with this. / Rav
Quote from: jj2007 on May 07, 2022, 11:02:02 PM
Quote from: Rav on May 07, 2022, 09:28:44 PM
Thanks for confirming that, Jochen. Is the Instr_ you referenced the same as the InString procedure (InString.asm)
They are different: InString is the routine which comes with the Installation of the Masm32 SDK; Instr_() (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1153) (with the trailing understroke) is a MasmBasic command.
Attached a test with a long main string, i.e. the content of the Windows.inc include file, and "Duplicate" as sub$. There are several differences in the results that I can't explain. Regarding the BM variants, it seems they work better when the length of the main string is supplied as a variable or constant:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Search$= ;;;; head...
sub$= Duplicate
49 ms for BMBinSearch, len calculated, result=-1
49 ms for BMHBinsearch, len calculated, result=-1
253 ms for BMBinSearch, len supplied, result=977320
330 ms for BMHBinsearch, len supplied, result=977320
723 ms for MasmBasic Instr_, result=977321
1017 ms for Masm32 SDK InString, result=0
225 ms for MasmBasic FAST Instr_, len supplied, result=977321
48 ms for BMBinSearch, len calculated, result=-1
52 ms for BMHBinsearch, len calculated, result=-1
253 ms for BMBinSearch, len supplied, result=977320
326 ms for BMHBinsearch, len supplied, result=977320
725 ms for MasmBasic Instr_, result=977321
1018 ms for Masm32 SDK InString, result=0
227 ms for MasmBasic FAST Instr_, len supplied, result=977321
48 ms for BMBinSearch, len calculated, result=-1
50 ms for BMHBinsearch, len calculated, result=-1
246 ms for BMBinSearch, len supplied, result=977320
324 ms for BMHBinsearch, len supplied, result=977320
727 ms for MasmBasic Instr_, result=977321
1018 ms for Masm32 SDK InString, result=0
222 ms for MasmBasic FAST Instr_, len supplied, result=977321
977321 is the correct value. The FAST Instr_() variant also needs the length of the main string, which is often easy to get for long strings: you typically read them from a file...
Quote from: Rav on May 08, 2022, 12:03:48 AMI just saw your latest post re "solved the issue with the -1 returned for the first two "len calculated" runs." I still don't understand how the code can return a different result just because in one case you're passing in a register which contains the calculated length, and in the other you're passing in the calculated length directly.
It's the order. When passing len(pString) as an argument, as in...
invoke BMBinSearch, 0, Search$,
len(Search$), sub$,
len(sub$)
... one would assume the following code:
invoke szLen, addr sub$
push eax
push sub$
invoke szLen, addr Search$
push Search$
push 0
call BMBinSearch
Compare that to the disassembly above, and you'll see that the order is wrong: eax gets trashed. Normally, the assembler would throw an error, but here it doesn't:
invoke szLen, addr sub$
invoke szLen, addr Search$
push eax
push sub$
push eax
push Search$
push 0
call BMBinSearch
In my case I'm not invoking any len-type function at all, in other words I'm not calculating the length at all, because my strings already have a length doubleword attached right to them (in a header). So I'm just copying the length doubleword stored in my strings header to the BM programs length doubleword, like this:
mov eax,vTXTnelm ; Number of elements in my source text string.
mov srcLngth,eax ; Copy that to the variable that the BM program uses.
And yes, I am PUSHing eax before I do that and POPing it afterwards so as not to destroy it.
Or am I still missing something?
Quote from: jj2007 on May 08, 2022, 12:08:50 AM
Quote from: Rav on May 08, 2022, 12:03:48 AMI just saw your latest post re "solved the issue with the -1 returned for the first two "len calculated" runs." I still don't understand how the code can return a different result just because in one case you're passing in a register which contains the calculated length, and in the other you're passing in the calculated length directly.
It's the order. When passing len(pString) as an argument, as in...
invoke BMBinSearch, 0, Search$, len(Search$), sub$, len(sub$)
... one would assume the following code:
invoke somelenfunction, addr sub$
push eax
push sub$
invoke somelenfunction, addr Search$
push Search$
push 0
call BMBinSearch
Compare that to the disassembly above, and you'll see that the order is wrong: eax gets trashed. Normally, the assembler would throw an error, but here it doesn't.
Your method is fine, no problem with that :thumbsup:
OP here. I guess the net of all this is, is there anyone here who can fix the sometimes incorrect results from the two Boyer-Moore string search algorithm implementations in the MASM32 SDK? These are BMBinSearch (Boyer-Moore, in file bm.asm) and BMHBinsearch (Boyer-Moore-Horspool, in file bmh.asm). I wish I could do it myself but I don't understand the algorithm itself. If they don't get fixed (or can't be) I'll just use something else not quite as fast, but of course I'd prefer it if they were.
Or can anyone point me to a MASM-compatible Boyer-Moore assembly (x86-32 bit) implementation elsewhere? I've Googled around but can't find one (except on this web site of course). It has to run in 32-bit processor mode, but it can use SIMD instructions (if I understand correctly, those are allowed in 32-bit mode if you use the .XMM directive).
Thanks / Rav
Another one, using the content of \Masm32\include\msvcrt.inc and
_imp__wscanf. The Horspool variant does ok:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Search$= ; -----...
sub$= _imp__wscanf
369 ms for BMBinSearch, len calculated, result=50522
695 ms for BMHBinsearch, len calculated, result=61855
158 ms for BMBinSearch, len supplied, result=50522
483 ms for BMHBinsearch, len supplied, result=61855
630 ms for MasmBasic Instr_, result=61856
765 ms for Masm32 SDK InString, result=0
247 ms for MasmBasic FAST Instr_, len supplied, result=61856
368 ms for BMBinSearch, len calculated, result=50522
689 ms for BMHBinsearch, len calculated, result=61855
159 ms for BMBinSearch, len supplied, result=50522
487 ms for BMHBinsearch, len supplied, result=61855
624 ms for MasmBasic Instr_, result=61856
743 ms for Masm32 SDK InString, result=0
255 ms for MasmBasic FAST Instr_, len supplied, result=61856
The normal BM variant has a problem:
Let sub$="_imp__wscanf" ; returns wrong result
Let sub$="imp__wscanf" ; returns good result
But the Horspool variant is consistently much slower:
460 ms for BMBinSearch, len calculated, result=61856
668 ms for BMHBinsearch, len calculated, result=61856
231 ms for BMBinSearch, len supplied, result=61856
458 ms for BMHBinsearch, len supplied, result=61856
616 ms for MasmBasic Instr_, result=61857
754 ms for Masm32 SDK InString, result=0
167 ms for MasmBasic FAST Instr_, len supplied, result=61857
Attention:
Boyer-Moore:
QuoteThe functions return either the zero based offset of the pattern being searched for or -1 if the pattern does not occur within the data being searched
Normal InString:
QuoteIf the function succeeds, it returns the 1 based index of the start of the substring or 0 if no match is found.
I wonder why the SDK InString doesn't work with a simple
invoke InString, 1, Search$, sub$ :sad:
Jochen (jj2007) -- I downloaded and installed MasmBasic and ran RichMasm.exe, and found the documentation for Instr_(), but I can't for the life of me figure out how to extract a (hopefully) self-contained .ASM file containing the implementation of Instr_, similar to the BM*.asm files in the MASM32 SDK. From the timings you've shown it looks like Instr_ is a fast string search algorithm, which would suit my needs. IS it available as a separate .ASM file that developers can use in their own programs? And does it run in 32-bit processor mode? Thanks.
Quote from: jj2007 on May 08, 2022, 02:08:36 AM
Another one, using the content of \Masm32\include\msvcrt.inc and _imp__wscanf. The Horspool variant does ok:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Search$= ; -----...
sub$= _imp__wscanf
369 ms for BMBinSearch, len calculated, result=50522
695 ms for BMHBinsearch, len calculated, result=61855
158 ms for BMBinSearch, len supplied, result=50522
483 ms for BMHBinsearch, len supplied, result=61855
630 ms for MasmBasic Instr_, result=61856
765 ms for Masm32 SDK InString, result=0
247 ms for MasmBasic FAST Instr_, len supplied, result=61856
368 ms for BMBinSearch, len calculated, result=50522
689 ms for BMHBinsearch, len calculated, result=61855
159 ms for BMBinSearch, len supplied, result=50522
487 ms for BMHBinsearch, len supplied, result=61855
624 ms for MasmBasic Instr_, result=61856
743 ms for Masm32 SDK InString, result=0
255 ms for MasmBasic FAST Instr_, len supplied, result=61856
The normal BM variant has a problem:
Let sub$="_imp__wscanf" ; returns wrong result
Let sub$="imp__wscanf" ; returns good result
But the Horspool variant is consistently much slower:
460 ms for BMBinSearch, len calculated, result=61856
668 ms for BMHBinsearch, len calculated, result=61856
231 ms for BMBinSearch, len supplied, result=61856
458 ms for BMHBinsearch, len supplied, result=61856
616 ms for MasmBasic Instr_, result=61857
754 ms for Masm32 SDK InString, result=0
167 ms for MasmBasic FAST Instr_, len supplied, result=61857
Attention:
Boyer-Moore:
QuoteThe functions return either the zero based offset of the pattern being searched for or -1 if the pattern does not occur within the data being searched
Normal InString:
QuoteIf the function succeeds, it returns the 1 based index of the start of the substring or 0 if no match is found.
I wonder why the SDK InString doesn't work with a simple invoke InString, 1, Search$, sub$ :sad:
Quote from: Rav on May 08, 2022, 04:07:54 AM
Jochen (jj2007) -- I downloaded and installed MasmBasic and ran RichMasm.exe, and found the documentation for Instr_(), but I can't for the life of me figure out how to extract a (hopefully) self-contained .ASM file containing the implementation of Instr_, similar to the BM*.asm files in the MASM32 SDK.
You can statically link to InstrCi in \Masm32\MasmBasic\MasmBasic.lib
The source is about 400 lines of highly idiosyncratic code, and I am sorry but it's a
closed source, because I don't want it to see copied & pasted all over the web.
Understood. I'm imagining (guessing really) that it might be using XMM registers and the PCMPESTRI (or related) instruction. At any rate, using PCMPESTRI is also something I'm going to research, if I can figure the silly thing out! Thanks again. / Rav
Quote from: jj2007 on May 08, 2022, 04:13:56 AM
Quote from: Rav on May 08, 2022, 04:07:54 AM
Jochen (jj2007) -- I downloaded and installed MasmBasic and ran RichMasm.exe, and found the documentation for Instr_(), but I can't for the life of me figure out how to extract a (hopefully) self-contained .ASM file containing the implementation of Instr_, similar to the BM*.asm files in the MASM32 SDK.
You can statically link to InstrCi in \Masm32\MasmBasic\MasmBasic.lib
The source is about 400 lines of highly idiosyncratic code, and I am sorry but it's a closed source, because I don't want it to see copied & pasted all over the web.
PCMPESTRI is SSE4.2; until now, I haven't gone beyond SSE2, in order to remain compatible to older CPUs. Check pcmpeqb if you are eager to roll your own ;-)
Hi Rav,
BMHBinsearch is reporting -1.
Quote from: Vortex on May 08, 2022, 05:14:50 AM
Hi Rav,
BMHBinsearch is reporting -1.
Thank you for checking. Unfortunately that's another confirmation that BMHBinsearch sometimes returns the wrong result (in addition to BMBinsearch with certain other strings).
Hi Erol,
Can you get Masm32 SDK InString to work with these strings?
Let Search$=FileRead$("\Masm32\include\msvcrt.inc") ; content of this file
Let sub$="_imp__wscanf" ; "_imp__wscanf" produces wrong result with BMBinSearch
invoke InString, 1, Search$, sub$ ; returns 0
Hi Jochen,
Here is the code :
include \masm32\include\masm32rt.inc
.data
filename db '\Masm32\include\msvcrt.inc',0
subs db '_imp__wscanf',0
msg db '1 based index of the start of the substring = %u',0
.data?
pMem dd ?
size1 dd ?
.code
start:
invoke read_disk_file,ADDR filename,\
ADDR pMem,ADDR size1
invoke InString,1,pMem,ADDR subs
invoke crt_printf,ADDR msg,eax
invoke GlobalFree,pMem
invoke ExitProcess,0
END start
The result is correct and verified with a hex editor reporting the offset of the substring :
1 based index of the start of the substring = 61856
Thanks a lot, Erol - your code works, and made me find my bug :thumbsup:
OP here. I think I've found a particular string and substring pair which causes the BMBinSearch version of the Boyer-Moore string searching program (\MASM32\M32LIB\bm.asm) to go into an endless loop, thus hanging and never returning, effectively crashing. At least here that's what it does.
String: '94764' (length 5)
Substring: '64' (length 2)
Jochen/Vortex/whoever, if you have a moment can you try these strings with that routine and verify that it hangs for you? Thanks / Rav
Confirmed :cool:
include \masm32\include\masm32rt.inc ; plain Masm32 for the fans of pure assembler
.code
start: cls
print "press Ctrl C to stop this", 13, 10
print "trying BMHBinsearch, 0, '947xx', 5, 'xx', 2", 13, 10
print str$(rv(BMHBinsearch, 0, "947xx", 5, "xx", 2)), " is the result for BMHBinsearch", 13, 10
print "trying BMBinSearch, 0, '947xx', 5, 'xx', 2", 13, 10
print str$(rv(BMBinSearch, 0, "947xx", 5, "xx", 2)), " is the result for BMBinSearch", 13, 10
MsgBox 0, "OK", "You won't see this box:", MB_OK
exit
end start
Using small samples in short strings to search is just about meaningless. Look at any BM variant and you will see that they are designed for finding longer sub strings in longer main strings. If you are doing searches of main strings under a few megabytes, you use a byte scanner as it will generally be fast enough and are byte perfect.
This topic should not be in the Campus, it should be in the lab.
Quote from: hutch-- on May 08, 2022, 09:58:15 AM
Using small samples in short strings to search is just about meaningless. Look at any BM variant and you will see that they are designed for finding longer sub strings in longer main strings. If you are doing searches of main strings under a few megabytes, you use a byte scanner as it will generally be fast enough and are byte perfect.
This topic should not be in the Campus, it should be in the lab.
Hi, hutch. In fact, I WAS using large strings in my testing (600K and upwards), when I found all these issues. But when reporting them to the forum I reduced them to a minimum length that still produced the error in order to simplify people being able to replicate them. Either way, whether it makes sense to use a byte scanner or not for small strings, I think that any program should always return the correct result no matter what size strings they are passed and, of course, not hang.
The BM algos are not that impressive, either. This searches Windows.inc for "Duplicate":
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
2534 ms for BMBinSearch, result=977320
2298 ms for MB Instr_(), result=977320
788 ms for new Instr_(), result=977320
2493 ms for BMBinSearch, result=977320
2309 ms for MB Instr_(), result=977320
784 ms for new Instr_(), result=977320
2488 ms for BMBinSearch, result=977320
2328 ms for MB Instr_(), result=977320
781 ms for new Instr_(), result=977320
:biggrin:
Rav,
> I think that any program should always return the correct result no matter what size strings they are passed and, of course, not hang.
What I did over 16 years ago was produce 3 BM algos according to the original design specs of Bob Boyer. Although I did massive testing of these algos over gigabytes of data, I don't believe there is an exhaustive technique to verify this family of algorithms.
The solution has always been if you need algos of this type and the existing ones are not doing the job you want, write them yourself, researching BM algos is a joy to behold, character building and a ton of fun if you can spend the life and time doing it.
A well written byte scanner is generally fast enough on modern hardware unless you are into gigabytes of data to search and there you will have to look for other designs of search algorithms.
The Boyer-Moore stuff is too slow for my taste. Good ol' Instr_() (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1153) is already faster than BMBinSearch, and I am working on a new one that is even a tick faster. Results for a 4,397,206 bytes file:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
317 ms for BMBinSearch, result=4397069
199 ms for MB Instr_(), result=4397069
58 ms for new Instr_(), result=4397069
312 ms for BMBinSearch, result=4397069
199 ms for MB Instr_(), result=4397069
56 ms for new Instr_(), result=4397069
314 ms for BMBinSearch, result=4397069
197 ms for MB Instr_(), result=4397069
56 ms for new Instr_(), result=4397069
Just run the attached exe, it will grab the 4MB file from my personal site if it doesn't find it in the folder.
The type of testing that makes sense to me is start with a long string, your 4 meg is fine. Append a unique string pattern to search for, almost at the end, add something after it and test the search algo on something predictable. This means the algo must plough through a mountain of stuff where there are no matches to get to the near end to find a match.
What kills a BM based algo is the lookup table construction before any search is done. It must also repeatedly access the table which is a slow memory acess at a high hit rate. At their best, a very long search string with highly random data to search through yields fast results but there are many situations where they deliver sub optimal performance. Byte scanners are reliable and are not alignment sensitive.
I have seen attempts to use larger data sizes than bytes but the overhead usually kills them.
Quote from: hutch-- on May 09, 2022, 10:52:07 AM
The type of testing that makes sense to me is start with a long string, your 4 meg is fine. Append a unique string pattern to search for, almost at the end, add something after it and test the search algo on something predictable.
That's exactly what the test above does, and it yields exactly the same correct result for all three algos tested. Besides, you get an interesting text free of charge :wink2:
How long is your search string ?
13 bytes. With a 60 byte string, BM is still half as fast as the new algo.
You are wasting your time with such small strings, both the source and the search string. Any decent byte scanner will eat it alive when the source and search is so small. Try a 64 byte string on a few gigabytes of data to see which is faster in that context. A BM has never been any good on tiny search patterns, you get a break even point as they get longer then advantages if the string being searched does not have repetitive patterns.
I am working on 64 bit code at the moment so I don't have the time to play with this but from the research I did back 15 years or so, if the search pattern is long enough, it allows the search point to jump much longer distances which if it is long enough, it makes up for the table construction and the high memory reads.
Hello sir Rav;
This code is not optimized. I created this code after read this topic, to you do your tests in an easy way.
main.asm
;ML.EXE /c /coff /Cp main.asm
;LINK.EXE /SUBSYSTEM:CONSOLE main.obj
;main | echo %errorlevel%
include \masm32\include\masm32rt.inc
Boyer_Moore PROTO :dword,:dword,:dword,:dword,:dword
.data
file db "\masm32\include\windows.inc",0
string db "Duplicate"
align 4
Text_test db 'xx includix'
String_test db 'did'
.code
start:
;-----------------------------------
call main ;will search for string "Duplicate" in file \masm32\include\windows.inc
;-----------------------------------
;if you want to check specific cases, comment call above and uncoment line below to Text_test and String_test cases
; invoke Boyer_Moore,0,addr Text_test,sizeof Text_test,addr String_test,sizeof String_test
;-----------------------------------
invoke ExitProcess,eax
main proc uses esi edi ebx
LOCAL plc:DWORD ;cmd line pointer
LOCAL saida:DWORD ;output handle
LOCAL manipulador_arquivo:dword ;file handle
LOCAL tamanho:DWORD ;source file size
LOCAL mapeado:DWORD ;ptr source file
LOCAL mem_aloc:DWORD ;ptr mem alloc
LOCAL nada:dword
LOCAL arq_mape:DWORD ;mapped file handle
LOCAL pos:DWORD ;initial position
LOCAL return_value:DWORD ;return result in eax register
invoke GetStdHandle,STD_OUTPUT_HANDLE
mov saida,eax
invoke GetCommandLine
mov plc,eax
invoke CreateFile,addr file,GENERIC_READ,0,0,OPEN_EXISTING,FILE_FLAG_SEQUENTIAL_SCAN,0 ;open source file
mov manipulador_arquivo,eax ;handle
invoke GetFileSize, manipulador_arquivo,NULL ;sizeof source file
mov tamanho,eax
invoke CreateFileMapping,manipulador_arquivo,NULL,PAGE_READONLY,0,0,0 ;mapped file handle
mov arq_mape,eax
invoke MapViewOfFile,eax,FILE_MAP_READ,0,0,0 ;map it, file contents pointer
mov mapeado,eax
invoke CloseHandle,manipulador_arquivo ;don't need this anymore
mov pos,0
invoke Boyer_Moore,pos,mapeado,tamanho,addr string,sizeof string
mov return_value,eax
invoke UnmapViewOfFile,mapeado
mov eax,return_value
ret
main endp
Boyer_Moore proc uses ebx esi edi startpos:dword,lpSource:dword,srcLngth:dword,lpSubStr:dword,subLngth:dword
LOCAL bad_chars[256]:dword
mov eax,-2 ;assume error in parameters data
.if srcLngth == 0
ret
.endif
.if subLngth == 0
ret
.endif
mov ecx,subLngth
.if ecx > srcLngth
ret
.endif
mov ecx,startpos
.if ecx > srcLngth
ret
.endif
lea edi,bad_chars
mov ecx,256
mov eax,subLngth
rep stosd
lea edi,bad_chars
mov esi,lpSubStr
mov ecx,subLngth
add esi,ecx
dec esi
movzx eax,byte ptr [esi]
mov dword ptr [edi+eax*4],1
mov ebx,1
dec esi
dec ecx
mov edx,subLngth
.while ecx != 0
movzx eax,byte ptr [esi]
.if dword ptr [edi+eax*4] == edx
mov dword ptr [edi+eax*4],ebx
inc ebx
.endif
dec esi
dec ecx
.endw
mov esi,lpSubStr
mov edi,lpSource
mov eax,startpos
add edi,eax
mov ebx,srcLngth
add ebx,lpSource
try_again:
.if edi >= ebx
mov eax,-1
ret
.endif
mov ecx,subLngth
add ecx,edi
.if edi > ebx
mov eax,-1
ret
.endif
mov ecx,subLngth
@@:
movzx eax,byte ptr [edi+ecx-1]
.if byte ptr [esi+ecx-1] == al
dec ecx
jnz @B
mov eax,edi
sub eax,lpSource
ret
.else
mov eax,dword ptr [bad_chars+eax*4]
.if eax != subLngth
add edi,eax
jmp try_again
.else
add edi,ecx
jmp try_again
.endif
.endif
ret
Boyer_Moore endp
end start
Quote from: mineiro on May 09, 2022, 10:44:17 PM
Hello sir Rav;
This code is not optimized. I created this code after read this topic, to you do your tests in an easy way.
Thank you. / Rav
I have commented code, maybe can be usefull:
;ML.EXE /c /coff /Cp main.asm
;LINK.EXE /SUBSYSTEM:CONSOLE main.obj
;main | echo %errorlevel%
include \masm32\include\masm32rt.inc
Boyer_Moore PROTO :dword,:dword,:dword,:dword,:dword
.data
file db "\masm32\include\windows.inc",0
string db "Duplicate"
align 4
Text_test db 'a'
String_test db 'a'
;Text_test db 'xx includix'
;String_test db 'did'
.code
start:
;-----------------------------------
; call main ;will search for string "Duplicate" in file \masm32\include\windows.inc
;-----------------------------------
;if you want to check specific cases, comment call above and uncoment line below to Text_test and String_test cases
invoke Boyer_Moore,0,addr Text_test,sizeof Text_test,addr String_test,sizeof String_test
;-----------------------------------
invoke ExitProcess,eax
main proc uses esi edi ebx
LOCAL plc:DWORD ;cmd line pointer
LOCAL saida:DWORD ;output handle
LOCAL manipulador_arquivo:dword ;file handle
LOCAL tamanho:DWORD ;source file size
LOCAL mapeado:DWORD ;ptr source file
LOCAL mem_aloc:DWORD ;ptr mem alloc
LOCAL nada:dword
LOCAL arq_mape:DWORD ;mapped file handle
LOCAL pos:DWORD ;initial position
LOCAL return_value:DWORD ;return result in eax register
invoke GetStdHandle,STD_OUTPUT_HANDLE
mov saida,eax
invoke GetCommandLine
mov plc,eax
invoke CreateFile,addr file,GENERIC_READ,0,0,OPEN_EXISTING,FILE_FLAG_SEQUENTIAL_SCAN,0 ;open source file
mov manipulador_arquivo,eax ;handle
invoke GetFileSize, manipulador_arquivo,NULL ;sizeof source file
mov tamanho,eax
invoke CreateFileMapping,manipulador_arquivo,NULL,PAGE_READONLY,0,0,0 ;mapped file handle
mov arq_mape,eax
invoke MapViewOfFile,eax,FILE_MAP_READ,0,0,0 ;map it, file contents pointer
mov mapeado,eax
invoke CloseHandle,manipulador_arquivo ;don't need this anymore
mov pos,0
invoke Boyer_Moore,pos,mapeado,tamanho,addr string,sizeof string
mov return_value,eax
invoke UnmapViewOfFile,mapeado
;invoke GlobalFree,mem_aloc
mov eax,return_value
ret
main endp
Boyer_Moore proc uses ebx esi edi startpos:dword,lpSource:dword,srcLngth:dword,lpSubStr:dword,subLngth:dword
LOCAL bad_chars[256]:dword
mov eax,-2 ;assume error in parameters data
.if srcLngth == 0 ;input size is zero?
ret
.endif
.if subLngth == 0 ;substring size is zero?
ret
.endif
mov ecx,subLngth
.if ecx > srcLngth ;substring size is bigger than source size?
ret
.endif
mov ecx,startpos ;start position is bigger than source size?
.if ecx > srcLngth
ret
.endif
lea edi,bad_chars ;create a lookup table filled with substring size
mov ecx,256
mov eax,subLngth
rep stosd
lea edi,bad_chars
mov esi,lpSubStr
mov ecx,subLngth
add esi,ecx
dec esi ;point to last char of substring, zero index based
;-------------------------
;eg: string = "batata", a=1, t=1, b=2, any other chars will be = 6 (sizeof string)
;eg: string = "done", e=1, n=1, o=2, d=3, any other chars will be = 4
;-------------------------
movzx eax,byte ptr [esi] ;read last char
mov dword ptr [edi+eax*4],1 ;its size will be one displacement, insert in bad_chars table
mov ebx,1 ;second char will be one too, third char will be 2, fourth char will be 3, ...
dec esi
dec ecx
mov edx,subLngth
.while ecx != 0 ;while exist chars in substring
movzx eax,byte ptr [esi] ;read that char
.if dword ptr [edi+eax*4] == edx ;actual char have a sizeof subLngth?
mov dword ptr [edi+eax*4],ebx ;yes, so overwrite this char displacement substring in lookup table
inc ebx ;increase displacement
.endif
dec esi
dec ecx
.endw
mov esi,lpSubStr
mov edi,lpSource
mov eax,startpos
add edi,eax ;edi=lPSource+startpos
mov ebx,srcLngth
add ebx,lpSource ;ebx=lpSource+srcLngth
try_again:
.if edi >= ebx ;we reach end of source?, exit
mov eax,-1
ret
.endif
mov ecx,subLngth
@@:
movzx eax,byte ptr [edi+ecx-1] ;read one char from source
.if byte ptr [esi+ecx-1] == al ;is this char equal to string char?
dec ecx ;yes, so, sizeof string = sizeof string -1
jnz @B ;repeat
mov eax,edi ;found string in text
sub eax,lpSource ;calculate position
ret ;return to caller
.else ;no, this char in source is not equal to string char
mov eax,dword ptr [bad_chars+eax*4] ;but this char it's a good char? This char exist in string chars?
.if eax != subLngth ;yes
add edi,eax ;so, align string to next position and try again
jmp try_again
.else ;no, actual char do not exist in string, we "should" add sizeof string displacement or less
;but actual char does not mean previous equal chars comparisions
add edi,ecx ;so, align string to next position and try again
jmp try_again
.endif
.endif
ret
Boyer_Moore endp
end start
Follow 2 codes, first was optimized in logic and code. Second was optimized in fact that lookup table does not need be created at each call to BM procedure, only in first call that table will be created and sucessive calls will skip table creation.
Now code prints position, and second version print ocurrences found:
;version 0.01
;ML.EXE /c /coff /Cp main2.asm
;LINK.EXE /SUBSYSTEM:CONSOLE main2.obj
;main
include \masm32\include\masm32rt.inc
.686
Boyer_Moore PROTO :dword,:dword,:dword,:dword,:dword
printf proto C :dword, :vararg ; msvcrt
.data
format_string db "%d",13,10,0
file db "\masm32\include\windows.inc",0
string db "Duplicate"
align 4
Text_test db 'xx includix'
String_test db 'did'
.code
start:
;-----------------------------------
call main ;will search for string "Duplicate" in file \masm32\include\windows.inc
;-----------------------------------
;if you want to check specific cases, comment call above and uncoment line below to Text_test and String_test cases
; invoke Boyer_Moore,0,addr Text_test,sizeof Text_test,addr String_test,sizeof String_test
;-----------------------------------
invoke ExitProcess,eax
main proc uses esi edi ebx
LOCAL plc:DWORD ;cmd line pointer
LOCAL saida:DWORD ;output handle
LOCAL manipulador_arquivo:dword ;file handle
LOCAL tamanho:DWORD ;source file size
LOCAL mapeado:DWORD ;ptr source file
LOCAL mem_aloc:DWORD ;ptr mem alloc
LOCAL nada:dword
LOCAL arq_mape:DWORD ;mapped file handle
LOCAL pos:DWORD ;initial position
LOCAL return_value:DWORD ;return result in eax register
invoke GetStdHandle,STD_OUTPUT_HANDLE
mov saida,eax
invoke GetCommandLine
mov plc,eax
invoke CreateFile,addr file,GENERIC_READ,0,0,OPEN_EXISTING,FILE_FLAG_SEQUENTIAL_SCAN,0 ;open source file
mov manipulador_arquivo,eax ;handle
invoke GetFileSize, manipulador_arquivo,NULL ;sizeof source file
mov tamanho,eax
invoke CreateFileMapping,manipulador_arquivo,NULL,PAGE_READONLY,0,0,0 ;mapped file handle
mov arq_mape,eax
invoke MapViewOfFile,eax,FILE_MAP_READ,0,0,0 ;map it, file contents pointer
mov mapeado,eax
invoke CloseHandle,manipulador_arquivo ;don't need this anymore
mov pos,0
@@:
invoke Boyer_Moore,pos,mapeado,tamanho,addr string,sizeof string
mov return_value,eax
invoke printf,addr format_string,return_value
.if return_value == -1
jmp done
.elseif return_value == -2
jmp done
.else
inc return_value
m2m pos,return_value
jmp @B
.endif
done:
invoke UnmapViewOfFile,mapeado
mov eax,return_value
ret
main endp
Boyer_Moore proc uses ebx esi edi startpos:dword,lpSource:dword,srcLngth:dword,lpSubStr:dword,subLngth:dword
LOCAL bad_chars[256]:dword
lea edi,bad_chars ;create a lookup table filled with substring size
mov ecx,256
mov eax,subLngth
rep stosd
lea edi,bad_chars
mov esi,lpSubStr
mov ecx,subLngth
add esi,subLngth
dec esi ;point to last char of substring, zero index based
;-------------------------
;eg: string = "batata", a=1, t=1, b=5, any other chars will be = 6 (sizeof string)
;eg: string = "done", e=1, n=1, o=2, d=3, any other chars will be = 4
;-------------------------
movzx eax,byte ptr [esi] ;read last char
mov dword ptr [edi+eax*4],1 ;its size will be one displacement, insert in bad_chars table
mov ebx,1 ;second char will be one too, third char will be 2, fourth char will be 3, ...
dec esi
dec ecx
mov edx,subLngth
.while ecx != 0 ;while exist chars in substring
movzx eax,byte ptr [esi] ;read that char
.if dword ptr [edi+eax*4] == edx ;actual char have a sizeof subLngth?
mov dword ptr [edi+eax*4],ebx ;yes, so overwrite this char displacement substring in lookup table
inc ebx ;increase displacement ;<<<<<
.endif
dec esi
dec ecx
.endw
;------------------------------
mov eax,-2 ;assume error in parameters data
mov esi,lpSubStr
mov edi,lpSource
mov ebx,srcLngth
mov ecx,subLngth
test ebx,ebx
jz done
test ecx,ecx
jz done
cmp ecx,ebx
ja done
add ebx,edi ;ebx=lpSource+srcLngth
add edi,startpos ;edi=lpSource+startpos
mov edx,subLngth
align 4
try_again:
mov eax,-1
cmp ebx,edi
jbe done
mov ecx,edx
@@:
movzx eax,byte ptr [edi+ecx-1] ;read one char from source
cmp byte ptr [esi+ecx-1],al
jne @F
dec ecx
jnz @B
sub edi,lpSource
mov eax,edi
jmp done
@@:
mov eax,dword ptr [bad_chars+eax*4]
cmp eax,edx ;cmp eax,edx
cmove eax,ecx
add edi,eax
jmp try_again
align 4
done:
ret
Boyer_Moore endp
end start
--------------------------------------
;version 0.02
;ML.EXE /c /coff /Cp main.asm
;LINK.EXE /SUBSYSTEM:CONSOLE main.obj
;main
include \masm32\include\masm32rt.inc
.686
Boyer_Moore PROTO :dword,:dword,:dword,:dword,:dword,:dword
printf proto C :dword, :vararg ; msvcrt
.data
format_string db "pos: %d",13,10,0
occurrences db "found %d ocurrence(s)",13,10,0
file db "\masm32\include\windows.inc",0
string db "Duplicate"
align 4
Text_test db 'xx includix'
String_test db 'did'
.data?
bad_chars dd 256 dup (?)
.code
start:
;-----------------------------------
call main ;will search for string "Duplicate" in file \masm32\include\windows.inc
;-----------------------------------
;if you want to check specific cases, comment call above and uncoment lines below to Text_test and String_test cases
; invoke Boyer_Moore,0,addr Text_test,sizeof Text_test,addr String_test,sizeof String_test,0
; push eax
; invoke printf,addr format_string,eax
; pop eax
;-----------------------------------
invoke ExitProcess,eax
main proc uses esi edi ebx
LOCAL plc:DWORD ;cmd line pointer
LOCAL saida:DWORD ;output handle
LOCAL manipulador_arquivo:dword ;file handle
LOCAL tamanho:DWORD ;source file size
LOCAL mapeado:DWORD ;ptr source file
LOCAL mem_aloc:DWORD ;ptr mem alloc
LOCAL nada:dword
LOCAL arq_mape:DWORD ;mapped file handle
LOCAL pos:DWORD ;initial position
LOCAL return_value:DWORD ;return result in eax register
LOCAL first_r:DWORD
invoke GetStdHandle,STD_OUTPUT_HANDLE
mov saida,eax
invoke GetCommandLine
mov plc,eax
invoke CreateFile,addr file,GENERIC_READ,0,0,OPEN_EXISTING,FILE_FLAG_SEQUENTIAL_SCAN,0 ;open source file
mov manipulador_arquivo,eax ;handle
invoke GetFileSize, manipulador_arquivo,NULL ;sizeof source file
mov tamanho,eax
invoke CreateFileMapping,manipulador_arquivo,NULL,PAGE_READONLY,0,0,0 ;mapped file handle
mov arq_mape,eax
invoke MapViewOfFile,eax,FILE_MAP_READ,0,0,0 ;map it, file contents pointer
mov mapeado,eax
invoke CloseHandle,manipulador_arquivo ;don't need this anymore
mov pos,0
mov first_r,0 ;inserted one more parameter to Boyer_Moore procedure
;if first_run is zero, so table creation is done
;if first_run is not zero, so table creation is skipped, gaining time execution
;so, table bad_chars was moved to data section to be globally accessed
@@:
invoke Boyer_Moore,pos,mapeado,tamanho,addr string,sizeof string,first_r
mov return_value,eax
.if return_value == -1
jmp done
.elseif return_value == -2
jmp done
.else
invoke printf,addr format_string,return_value
inc return_value
m2m pos,return_value
inc first_r ;this is a occurrences count, and will trigger to not recreate lookup table
jmp @B
.endif
done:
invoke UnmapViewOfFile,mapeado
invoke printf,addr occurrences,first_r
mov eax,return_value
ret
main endp
Boyer_Moore proc uses ebx esi edi startpos:dword,lpSource:dword,srcLngth:dword,lpSubStr:dword,subLngth:dword,first_run:dword
cmp first_run,0
jnz @F
lea edi,bad_chars ;create a lookup table filled with substring size
mov ecx,256
mov eax,subLngth
rep stosd
lea edi,bad_chars
mov esi,lpSubStr
mov ecx,subLngth
add esi,subLngth
dec esi ;point to last char of substring, zero index based
;-------------------------
;eg: string = "batata", a=1, t=1, b=5, any other chars will be = 6 (sizeof string)
;eg: string = "done", e=1, n=1, o=2, d=3, any other chars will be = 4
;-------------------------
movzx eax,byte ptr [esi] ;read last char
mov dword ptr [edi+eax*4],1 ;its size will be one displacement, insert in bad_chars table
mov ebx,1 ;second char will be one too, third char will be 2, fourth char will be 3, ...
dec esi
dec ecx
mov edx,subLngth
.while ecx != 0 ;while exist chars in substring
movzx eax,byte ptr [esi] ;read that char
.if dword ptr [edi+eax*4] == edx ;actual char have a sizeof subLngth?
mov dword ptr [edi+eax*4],ebx ;yes, so overwrite this char displacement substring in lookup table
inc ebx ;increase displacement
.endif
dec esi
dec ecx
.endw
;------------------------------
@@:
mov eax,-2 ;assume error in parameters data
mov esi,lpSubStr
mov edi,lpSource
mov ebx,srcLngth
mov ecx,subLngth
test ebx,ebx
jz done
test ecx,ecx
jz done
cmp ecx,ebx
ja done
add ebx,edi ;ebx=lpSource+srcLngth
add edi,startpos ;edi=lpSource+startpos
mov edx,subLngth
align 4
try_again:
mov eax,-1
cmp ebx,edi
jbe done
mov ecx,edx
@@:
movzx eax,byte ptr [edi+ecx-1] ;read one char from source
cmp byte ptr [esi+ecx-1],al
jne @F
dec ecx
jnz @B
sub edi,lpSource
mov eax,edi
jmp done
@@:
mov eax,dword ptr [bad_chars+eax*4]
cmp eax,edx ;test eax,edx
cmove eax,ecx
add edi,eax
jmp try_again
align 4
done:
ret
Boyer_Moore endp
end start
Code can be optimized more, but I need travel tomorrow, so, ... .
---edited---
Corrected code. First fault was optimization used "test" instead of "cmp", second was a logic problem.
@Mineiro - timings using the second version:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Finding a string 1000 times in a 954kB 'haystack' from\Masm32\include\Windows.inc
... using a 83 bytes substring
1119 ms for CRT strstr result=977321
225 ms for BMHBinsearch result=977321
100 ms for BMBinSearch result=977321
94 ms for Boyer-M Mineiro result=977321
78 ms for MB Instr_() result=977321
... using a 43 bytes substring
1106 ms for CRT strstr result=977321
326 ms for BMHBinsearch result=977321
252 ms for BMBinSearch result=977321
241 ms for Boyer-M Mineiro result=977321
81 ms for MB Instr_() result=977321
... using a 22 bytes substring
1105 ms for CRT strstr result=977321
640 ms for BMHBinsearch result=977321
436 ms for BMBinSearch result=977321
410 ms for Boyer-M Mineiro result=0
80 ms for MB Instr_() result=977321
... using a 11 bytes substring
1099 ms for CRT strstr result=977321
935 ms for BMHBinsearch result=977321
1284 ms for BMBinSearch result=977321
1372 ms for Boyer-M Mineiro result=977321
78 ms for MB Instr_() result=977321
... using a 6 bytes substring
1090 ms for CRT strstr result=977321
339 ms for BMHBinsearch result=977321
302 ms for BMBinSearch result=0
762 ms for Boyer-M Mineiro result=977321
80 ms for MB Instr_() result=977321
**** last sub$ was [Duplic] -- hit any key ****
Building this project requires MasmBasic version 10 May 22 (http://masm32.com/board/index.php?topic=94.0) :cool:
P.S.: The second attachment is built with both your first and your second code, and a finer granularity for the length of the substrings showing where the problems are (idem for BMBinSearch - the Horspool variant seems ok). Here is its output:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Finding a string 10 times in a 954kB 'haystack' from\Masm32\include\Windows.inc
... using a 83 bytes substring [Duplicate include file windows.inc<crlf>echo ------------------------------------------]
11 ms for CRT strstr result=977321
2401 µs for BMHBinsearch result=977321
1082 µs for BMBinSearch result=977321
996 µs for Boyer-M Mineiro 1 result=977321
998 µs for Boyer-M Mineiro 2 result=977321
835 µs for MB Instr_() result=977321
... using a 71 bytes substring [Duplicate include file windows.inc<crlf>echo ------------------------------]
11 ms for CRT strstr result=977321
2989 µs for BMHBinsearch result=977321
1280 µs for BMBinSearch result=977321
1177 µs for Boyer-M Mineiro 1 result=0
1185 µs for Boyer-M Mineiro 2 result=0
834 µs for MB Instr_() result=977321
... using a 60 bytes substring [Duplicate include file windows.inc<crlf>echo -------------------]
11 ms for CRT strstr result=977321
3095 µs for BMHBinsearch result=977321
1620 µs for BMBinSearch result=977321
1399 µs for Boyer-M Mineiro 1 result=0
1403 µs for Boyer-M Mineiro 2 result=0
834 µs for MB Instr_() result=977321
... using a 51 bytes substring [Duplicate include file windows.inc<crlf>echo ----------]
11 ms for CRT strstr result=977321
3022 µs for BMHBinsearch result=977321
1891 µs for BMBinSearch result=977321
1637 µs for Boyer-M Mineiro 1 result=977321
1665 µs for Boyer-M Mineiro 2 result=977321
843 µs for MB Instr_() result=977321
... using a 43 bytes substring [Duplicate include file windows.inc<crlf>echo --]
12 ms for CRT strstr result=977321
3486 µs for BMHBinsearch result=977321
2676 µs for BMBinSearch result=977321
2751 µs for Boyer-M Mineiro 1 result=977321
2615 µs for Boyer-M Mineiro 2 result=977321
837 µs for MB Instr_() result=977321
... using a 37 bytes substring [Duplicate include file windows.inc<crlf>e]
11 ms for CRT strstr result=977321
4709 µs for BMHBinsearch result=977321
2301 µs for BMBinSearch result=977321
2152 µs for Boyer-M Mineiro 1 result=977321
2377 µs for Boyer-M Mineiro 2 result=977321
1082 µs for MB Instr_() result=977321
... using a 31 bytes substring [Duplicate include file windows.]
11 ms for CRT strstr result=977321
5671 µs for BMHBinsearch result=977321
2737 µs for BMBinSearch result=977321
2609 µs for Boyer-M Mineiro 1 result=977321
2715 µs for Boyer-M Mineiro 2 result=977321
1113 µs for MB Instr_() result=977321
... using a 26 bytes substring [Duplicate include file win]
11 ms for CRT strstr result=977321
6292 µs for BMHBinsearch result=977321
4377 µs for BMBinSearch result=977321
4338 µs for Boyer-M Mineiro 1 result=0
3937 µs for Boyer-M Mineiro 2 result=0
1188 µs for MB Instr_() result=977321
... using a 22 bytes substring [Duplicate include file]
11 ms for CRT strstr result=977321
6236 µs for BMHBinsearch result=977321
4743 µs for BMBinSearch result=977321
4248 µs for Boyer-M Mineiro 1 result=0
4701 µs for Boyer-M Mineiro 2 result=0
1156 µs for MB Instr_() result=977321
... using a 19 bytes substring [Duplicate include f]
11 ms for CRT strstr result=977321
7136 µs for BMHBinsearch result=977321
10 ms for BMBinSearch result=977321
10 ms for Boyer-M Mineiro 1 result=977321
9423 µs for Boyer-M Mineiro 2 result=977321
921 µs for MB Instr_() result=977321
... using a 16 bytes substring [Duplicate includ]
11 ms for CRT strstr result=977321
8211 µs for BMHBinsearch result=977321
5034 µs for BMBinSearch result=977321
3128 µs for Boyer-M Mineiro 1 result=0
3165 µs for Boyer-M Mineiro 2 result=0
935 µs for MB Instr_() result=977321
... using a 14 bytes substring [Duplicate incl]
11 ms for CRT strstr result=977321
8635 µs for BMHBinsearch result=977321
6008 µs for BMBinSearch result=977321
5635 µs for Boyer-M Mineiro 1 result=977321
5693 µs for Boyer-M Mineiro 2 result=977321
948 µs for MB Instr_() result=977321
... using a 12 bytes substring [Duplicate in]
11 ms for CRT strstr result=977321
9443 µs for BMHBinsearch result=977321
8710 µs for BMBinSearch result=977321
4186 µs for Boyer-M Mineiro 1 result=977321
4147 µs for Boyer-M Mineiro 2 result=977321
973 µs for MB Instr_() result=977321
... using a 10 bytes substring [Duplicate ]
11 ms for CRT strstr result=977321
6199 µs for BMHBinsearch result=977321
3345 µs for BMBinSearch result=0
7191 µs for Boyer-M Mineiro 1 result=977321
6984 µs for Boyer-M Mineiro 2 result=977321
1518 µs for MB Instr_() result=977321
... using a 9 bytes substring [Duplicate]
11 ms for CRT strstr result=977321
3473 µs for BMHBinsearch result=977321
2853 µs for BMBinSearch result=977321
5692 µs for Boyer-M Mineiro 1 result=977321
5833 µs for Boyer-M Mineiro 2 result=977321
1111 µs for MB Instr_() result=977321
... using a 8 bytes substring [Duplicat]
11 ms for CRT strstr result=977321
3303 µs for BMHBinsearch result=977321
2778 µs for BMBinSearch result=977321
6208 µs for Boyer-M Mineiro 1 result=0
6207 µs for Boyer-M Mineiro 2 result=0
1128 µs for MB Instr_() result=977321
... using a 7 bytes substring [Duplica]
11 ms for CRT strstr result=977321
3625 µs for BMHBinsearch result=977321
3216 µs for BMBinSearch result=977321
7117 µs for Boyer-M Mineiro 1 result=977321
7055 µs for Boyer-M Mineiro 2 result=977321
1237 µs for MB Instr_() result=977321
... using a 6 bytes substring [Duplic]
11 ms for CRT strstr result=977321
3689 µs for BMHBinsearch result=977321
3376 µs for BMBinSearch result=0
8198 µs for Boyer-M Mineiro 1 result=977321
8125 µs for Boyer-M Mineiro 2 result=977321
1222 µs for MB Instr_() result=977321
... using a 5 bytes substring [Dupli]
11 ms for CRT strstr result=977321
4588 µs for BMHBinsearch result=977321
4129 µs for BMBinSearch result=0
9861 µs for Boyer-M Mineiro 1 result=977321
9925 µs for Boyer-M Mineiro 2 result=977321
954 µs for MB Instr_() result=977321
... using a 4 bytes substring [Dupl]
4551 µs for CRT strstr result=391356
2000 µs for BMHBinsearch result=391356
1821 µs for BMBinSearch result=391356
12 ms for Boyer-M Mineiro 1 result=0
12 ms for Boyer-M Mineiro 2 result=0
455 µs for MB Instr_() result=391356
Thanks sir jj2007;
I found where the errors sits in my code.
First was an optimization, used "test" instead of "cmp".
Second was an logic error optimization, added more displacement than need.
I have corrected previous posts.
This only occurs after I tried to optimize code.
Quote from: mineiro on May 10, 2022, 08:22:38 PM
I found where the errors sits in my code.
Thanks, mineiro :thumbsup:
So far no errors found in your code. I am now using a 100MB "haystack" based on Windows.inc with "Doplicate" added to the end for the search string:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Finding a string 5 times in a 100MB 'haystack' from\Masm32\include\Windows.inc
... using a 83 bytes substring [Doplicate include file windows.inc<crlf>echo -------------------
621 ms for CRT strstr result=105560497
618 ms for M32 InString result=105560497
137 ms for BMHBinsearch result=0
75 ms for BMBinSearch result=105560497
178 ms for Boyer-M Mineiro 1 result=105560497
185 ms for Boyer-M Mineiro 2 result=105560497
62 ms for MB Instr_() result=105560497
... using a 71 bytes substring [Doplicate include file windows.inc<crlf>echo -------------------
625 ms for CRT strstr result=105560497
588 ms for M32 InString result=105560497
167 ms for BMHBinsearch result=105560497
84 ms for BMBinSearch result=105560497
232 ms for Boyer-M Mineiro 1 result=105560497
240 ms for Boyer-M Mineiro 2 result=105560497
62 ms for MB Instr_() result=105560497
... using a 60 bytes substring [Doplicate include file windows.inc<crlf>echo -------------------
620 ms for CRT strstr result=105560497
590 ms for M32 InString result=105560497
177 ms for BMHBinsearch result=105560497
99 ms for BMBinSearch result=105560497
247 ms for Boyer-M Mineiro 1 result=105560497
251 ms for Boyer-M Mineiro 2 result=105560497
62 ms for MB Instr_() result=105560497
... using a 51 bytes substring [Doplicate include file windows.inc<crlf>echo ----------]
613 ms for CRT strstr result=105560497
576 ms for M32 InString result=105560497
176 ms for BMHBinsearch result=105560497
117 ms for BMBinSearch result=105560497
242 ms for Boyer-M Mineiro 1 result=105560497
243 ms for Boyer-M Mineiro 2 result=105560497
63 ms for MB Instr_() result=105560497
... using a 43 bytes substring [Doplicate include file windows.inc<crlf>echo --]
626 ms for CRT strstr result=105560497
578 ms for M32 InString result=105560497
181 ms for BMHBinsearch result=105560497
151 ms for BMBinSearch result=105560497
203 ms for Boyer-M Mineiro 1 result=105560497
205 ms for Boyer-M Mineiro 2 result=105560497
64 ms for MB Instr_() result=105560497
... using a 37 bytes substring [Doplicate include file windows.inc<crlf>e]
616 ms for CRT strstr result=105560497
586 ms for M32 InString result=105560497
262 ms for BMHBinsearch result=105560497
124 ms for BMBinSearch result=105560497
137 ms for Boyer-M Mineiro 1 result=105560497
143 ms for Boyer-M Mineiro 2 result=105560497
63 ms for MB Instr_() result=105560497
... using a 31 bytes substring [Doplicate include file windows.]
618 ms for CRT strstr result=105560497
582 ms for M32 InString result=105560497
310 ms for BMHBinsearch result=105560497
148 ms for BMBinSearch result=105560497
152 ms for Boyer-M Mineiro 1 result=105560497
154 ms for Boyer-M Mineiro 2 result=105560497
62 ms for MB Instr_() result=105560497
... using a 26 bytes substring [Doplicate include file win]
622 ms for CRT strstr result=105560497
582 ms for M32 InString result=105560497
344 ms for BMHBinsearch result=105560497
254 ms for BMBinSearch result=105560497
255 ms for Boyer-M Mineiro 1 result=105560497
246 ms for Boyer-M Mineiro 2 result=105560497
62 ms for MB Instr_() result=105560497
... using a 22 bytes substring [Doplicate include file]
624 ms for CRT strstr result=105560497
581 ms for M32 InString result=105560497
359 ms for BMHBinsearch result=105560497
259 ms for BMBinSearch result=105560497
252 ms for Boyer-M Mineiro 1 result=105560497
254 ms for Boyer-M Mineiro 2 result=105560497
62 ms for MB Instr_() result=105560497
... using a 19 bytes substring [Doplicate include f]
620 ms for CRT strstr result=105560497
582 ms for M32 InString result=105560497
387 ms for BMHBinsearch result=105560497
547 ms for BMBinSearch result=105560497
575 ms for Boyer-M Mineiro 1 result=105560497
576 ms for Boyer-M Mineiro 2 result=105560497
62 ms for MB Instr_() result=105560497
... using a 16 bytes substring [Doplicate includ]
624 ms for CRT strstr result=105560497
595 ms for M32 InString result=105560497
442 ms for BMHBinsearch result=105560497
265 ms for BMBinSearch result=105560497
253 ms for Boyer-M Mineiro 1 result=105560497
259 ms for Boyer-M Mineiro 2 result=105560497
62 ms for MB Instr_() result=105560497
... using a 14 bytes substring [Doplicate incl]
622 ms for CRT strstr result=105560497
585 ms for M32 InString result=105560497
463 ms for BMHBinsearch result=105560497
319 ms for BMBinSearch result=105560497
310 ms for Boyer-M Mineiro 1 result=105560497
308 ms for Boyer-M Mineiro 2 result=105560497
60 ms for MB Instr_() result=105560497
... using a 12 bytes substring [Doplicate in]
620 ms for CRT strstr result=105560497
594 ms for M32 InString result=105560497
506 ms for BMHBinsearch result=105560497
469 ms for BMBinSearch result=105560497
483 ms for Boyer-M Mineiro 1 result=105560497
480 ms for Boyer-M Mineiro 2 result=105560497
60 ms for MB Instr_() result=105560497
... using a 10 bytes substring [Doplicate ]
634 ms for CRT strstr result=105560497
609 ms for M32 InString result=105560497
307 ms for BMHBinsearch result=105560497
173 ms for BMBinSearch result=105560497
916 ms for Boyer-M Mineiro 1 result=105560497
913 ms for Boyer-M Mineiro 2 result=105560497
63 ms for MB Instr_() result=105560497
... using a 9 bytes substring [Doplicate]
613 ms for CRT strstr result=105560497
617 ms for M32 InString result=105560497
173 ms for BMHBinsearch result=105560497
142 ms for BMBinSearch result=105560497
312 ms for Boyer-M Mineiro 1 result=105560497
310 ms for Boyer-M Mineiro 2 result=105560497
60 ms for MB Instr_() result=105560497
... using a 8 bytes substring [Doplicat]
619 ms for CRT strstr result=105560497
579 ms for M32 InString result=105560497
161 ms for BMHBinsearch result=105560497
137 ms for BMBinSearch result=0
334 ms for Boyer-M Mineiro 1 result=105560497
331 ms for Boyer-M Mineiro 2 result=105560497
62 ms for MB Instr_() result=105560497
... using a 7 bytes substring [Doplica]
616 ms for CRT strstr result=105560497
576 ms for M32 InString result=105560497
174 ms for BMHBinsearch result=105560497
146 ms for BMBinSearch result=105560497
369 ms for Boyer-M Mineiro 1 result=105560497
368 ms for Boyer-M Mineiro 2 result=105560497
63 ms for MB Instr_() result=105560497
... using a 6 bytes substring [Doplic]
618 ms for CRT strstr result=105560497
580 ms for M32 InString result=105560497
185 ms for BMHBinsearch result=105560497
161 ms for BMBinSearch result=0
428 ms for Boyer-M Mineiro 1 result=105560497
432 ms for Boyer-M Mineiro 2 result=105560497
60 ms for MB Instr_() result=105560497
Follow 3 adaptations from Boyer Moore algorithm with error check (no bounds error, zero size of string and pattern, pattern bigger than string).
This version its a byte search, so you can search for ascii, unicode, utf8, binary data, ... .
The nice thing is that algorithm can be implemented more.
One situation happens when pattern to search have a lot of repeating chars. One option is to search for less frequent char (byte). Other option is to create a strong prefix characters position (I avoid this because adds more complexity to algorithm (it's based in single digraphs, trigraphs, ..., read Boyer Moore paper in internet)). Other option is insert a linear search inside this algorithm (yes, both can be combined).
Your exe:
empty -8 cycles
BM13 308776 cycles, 1 ocurrences found
BM16 327836 cycles, 1 ocurrences found
BM17 344912 cycles, 1 ocurrences found
I've integrated your version 13 in the testbed (attached), here are the results:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Finding a string 5 times in a 100MB 'haystack' from\Masm32\include\Windows.inc
... using a 83 bytes substring [Doplicate include file windows.inc<crlf>echo ------------------------------------------]
577 ms for M32 InString result=105560497
133 ms for BMHBinsearch result=0
74 ms for BMBinSearch result=105560497
171 ms for Boyer-M Mineiro 13 result=105560497
62 ms for MB Instr_() result=105560497
... using a 71 bytes substring [Doplicate include file windows.inc<crlf>echo ------------------------------]
560 ms for M32 InString result=105560497
159 ms for BMHBinsearch result=105560497
80 ms for BMBinSearch result=105560497
233 ms for Boyer-M Mineiro 13 result=105560497
61 ms for MB Instr_() result=105560497
... using a 60 bytes substring [Doplicate include file windows.inc<crlf>echo -------------------]
559 ms for M32 InString result=105560497
160 ms for BMHBinsearch result=105560497
96 ms for BMBinSearch result=105560497
249 ms for Boyer-M Mineiro 13 result=105560497
60 ms for MB Instr_() result=105560497
... using a 51 bytes substring [Doplicate include file windows.inc<crlf>echo ----------]
558 ms for M32 InString result=105560497
164 ms for BMHBinsearch result=105560497
113 ms for BMBinSearch result=105560497
237 ms for Boyer-M Mineiro 13 result=105560497
60 ms for MB Instr_() result=105560497
... using a 43 bytes substring [Doplicate include file windows.inc<crlf>echo --]
559 ms for M32 InString result=105560497
176 ms for BMHBinsearch result=105560497
147 ms for BMBinSearch result=105560497
207 ms for Boyer-M Mineiro 13 result=105560497
60 ms for MB Instr_() result=105560497
... using a 37 bytes substring [Doplicate include file windows.inc<crlf>e]
580 ms for M32 InString result=105560497
249 ms for BMHBinsearch result=105560497
126 ms for BMBinSearch result=105560497
371 ms for Boyer-M Mineiro 13 result=0
60 ms for MB Instr_() result=105560497
... using a 31 bytes substring [Doplicate include file windows.]
585 ms for M32 InString result=105560497
302 ms for BMHBinsearch result=105560497
144 ms for BMBinSearch result=105560497
444 ms for Boyer-M Mineiro 13 result=0
61 ms for MB Instr_() result=105560497
... using a 26 bytes substring [Doplicate include file win]
577 ms for M32 InString result=105560497
335 ms for BMHBinsearch result=105560497
242 ms for BMBinSearch result=105560497
489 ms for Boyer-M Mineiro 13 result=0
61 ms for MB Instr_() result=105560497
... using a 22 bytes substring [Doplicate include file]
584 ms for M32 InString result=105560497
347 ms for BMHBinsearch result=105560497
247 ms for BMBinSearch result=105560497
532 ms for Boyer-M Mineiro 13 result=0
60 ms for MB Instr_() result=105560497
... using a 19 bytes substring [Doplicate include f]
589 ms for M32 InString result=105560497
381 ms for BMHBinsearch result=105560497
538 ms for BMBinSearch result=105560497
538 ms for Boyer-M Mineiro 13 result=0
63 ms for MB Instr_() result=105560497
... using a 16 bytes substring [Doplicate includ]
593 ms for M32 InString result=105560497
426 ms for BMHBinsearch result=105560497
254 ms for BMBinSearch result=105560497
596 ms for Boyer-M Mineiro 13 result=0
62 ms for MB Instr_() result=105560497
... using a 14 bytes substring [Doplicate incl]
586 ms for M32 InString result=105560497
444 ms for BMHBinsearch result=105560497
305 ms for BMBinSearch result=105560497
642 ms for Boyer-M Mineiro 13 result=0
60 ms for MB Instr_() result=105560497
... using a 12 bytes substring [Doplicate in]
577 ms for M32 InString result=105560497
493 ms for BMHBinsearch result=105560497
451 ms for BMBinSearch result=105560497
693 ms for Boyer-M Mineiro 13 result=0
61 ms for MB Instr_() result=105560497
... using a 10 bytes substring [Doplicate ]
603 ms for M32 InString result=105560497
305 ms for BMHBinsearch result=105560497
163 ms for BMBinSearch result=105560497
821 ms for Boyer-M Mineiro 13 result=105560497
60 ms for MB Instr_() result=105560497
... using a 9 bytes substring [Doplicate]
585 ms for M32 InString result=105560497
167 ms for BMHBinsearch result=105560497
142 ms for BMBinSearch result=105560497
845 ms for Boyer-M Mineiro 13 result=0
61 ms for MB Instr_() result=105560497
... using a 8 bytes substring [Doplicat]
579 ms for M32 InString result=105560497
160 ms for BMHBinsearch result=105560497
132 ms for BMBinSearch result=0
833 ms for Boyer-M Mineiro 13 result=0
62 ms for MB Instr_() result=105560497
... using a 7 bytes substring [Doplica]
578 ms for M32 InString result=105560497
167 ms for BMHBinsearch result=105560497
147 ms for BMBinSearch result=105560497
885 ms for Boyer-M Mineiro 13 result=0
63 ms for MB Instr_() result=105560497
... using a 6 bytes substring [Doplic]
584 ms for M32 InString result=105560497
187 ms for BMHBinsearch result=105560497
165 ms for BMBinSearch result=0
945 ms for Boyer-M Mineiro 13 result=105560497
62 ms for MB Instr_() result=105560497
**** last sub$ was [Doplic] -- hit any key ****
Quote from: jj2007 on August 14, 2023, 01:44:30 AM371 ms for Boyer-M Mineiro 13 result=0
444 ms for Boyer-M Mineiro 13 result=0
Strange no!? It's not finding some patterns.
Please, can you check for the same string not found in your tests by using the one I posted?
Thanks.
Please post your haystack/needle combination, or just use my source (attached above) to do it yourself.
Note that the Masm32 SDK BMHBinSearch & BMBinSearch algos also have problems. Your code works most of the time.
All others work all the time.
Quote from: jj2007 on August 14, 2023, 04:04:11 AMPlease post your haystack/needle combination, or just use my source (attached above) to do it yourself.
I can't, I'm not a basic guy.
QuoteNote that the Masm32 SDK BMHBinSearch & BMBinSearch algos also have problems. Your code works most of the time.
All others work all the time.
That one works to single search instance all the times. That was the error, I was able to find all strings. Not a problem, this one is ready to multiple instances.
Congrats, mineiro, this one does not produce any errors, and it's also much faster than the previous version :thumbsup:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Finding a string 5 times in a 100MB 'haystack' from\Masm32\include\Windows.inc
... using a 83 bytes substring [Doplicate include file windows.inc<crlf>echo ------------------------------------------]
581 ms for M32 InString result=105560497
130 ms for BMHBinsearch result=0
73 ms for BMBinSearch result=105560497
174 ms for Boyer-M Mineiro 13 result=105560497
59 ms for MB Instr_() result=105560497
... using a 71 bytes substring [Doplicate include file windows.inc<crlf>echo ------------------------------]
571 ms for M32 InString result=105560497
159 ms for BMHBinsearch result=105560497
82 ms for BMBinSearch result=105560497
230 ms for Boyer-M Mineiro 13 result=105560497
59 ms for MB Instr_() result=105560497
... using a 60 bytes substring [Doplicate include file windows.inc<crlf>echo -------------------]
570 ms for M32 InString result=105560497
167 ms for BMHBinsearch result=105560497
99 ms for BMBinSearch result=105560497
259 ms for Boyer-M Mineiro 13 result=105560497
59 ms for MB Instr_() result=105560497
... using a 51 bytes substring [Doplicate include file windows.inc<crlf>echo ----------]
570 ms for M32 InString result=105560497
169 ms for BMHBinsearch result=105560497
115 ms for BMBinSearch result=105560497
243 ms for Boyer-M Mineiro 13 result=105560497
61 ms for MB Instr_() result=105560497
... using a 43 bytes substring [Doplicate include file windows.inc<crlf>echo --]
572 ms for M32 InString result=105560497
182 ms for BMHBinsearch result=105560497
148 ms for BMBinSearch result=105560497
203 ms for Boyer-M Mineiro 13 result=105560497
60 ms for MB Instr_() result=105560497
... using a 37 bytes substring [Doplicate include file windows.inc<crlf>e]
573 ms for M32 InString result=105560497
258 ms for BMHBinsearch result=105560497
121 ms for BMBinSearch result=105560497
139 ms for Boyer-M Mineiro 13 result=105560497
62 ms for MB Instr_() result=105560497
... using a 31 bytes substring [Doplicate include file windows.]
575 ms for M32 InString result=105560497
302 ms for BMHBinsearch result=105560497
146 ms for BMBinSearch result=105560497
150 ms for Boyer-M Mineiro 13 result=105560497
62 ms for MB Instr_() result=105560497
... using a 26 bytes substring [Doplicate include file win]
572 ms for M32 InString result=105560497
346 ms for BMHBinsearch result=105560497
238 ms for BMBinSearch result=105560497
250 ms for Boyer-M Mineiro 13 result=105560497
62 ms for MB Instr_() result=105560497
... using a 22 bytes substring [Doplicate include file]
577 ms for M32 InString result=105560497
370 ms for BMHBinsearch result=105560497
246 ms for BMBinSearch result=105560497
246 ms for Boyer-M Mineiro 13 result=105560497
60 ms for MB Instr_() result=105560497
... using a 19 bytes substring [Doplicate include f]
572 ms for M32 InString result=105560497
383 ms for BMHBinsearch result=105560497
532 ms for BMBinSearch result=105560497
577 ms for Boyer-M Mineiro 13 result=105560497
61 ms for MB Instr_() result=105560497
... using a 16 bytes substring [Doplicate includ]
577 ms for M32 InString result=105560497
433 ms for BMHBinsearch result=105560497
259 ms for BMBinSearch result=105560497
254 ms for Boyer-M Mineiro 13 result=105560497
61 ms for MB Instr_() result=105560497
... using a 14 bytes substring [Doplicate incl]
574 ms for M32 InString result=105560497
457 ms for BMHBinsearch result=105560497
314 ms for BMBinSearch result=105560497
312 ms for Boyer-M Mineiro 13 result=105560497
60 ms for MB Instr_() result=105560497
... using a 12 bytes substring [Doplicate in]
577 ms for M32 InString result=105560497
501 ms for BMHBinsearch result=105560497
460 ms for BMBinSearch result=105560497
472 ms for Boyer-M Mineiro 13 result=105560497
61 ms for MB Instr_() result=105560497
... using a 10 bytes substring [Doplicate ]
601 ms for M32 InString result=105560497
307 ms for BMHBinsearch result=105560497
165 ms for BMBinSearch result=105560497
849 ms for Boyer-M Mineiro 13 result=105560497
60 ms for MB Instr_() result=105560497
... using a 9 bytes substring [Doplicate]
575 ms for M32 InString result=105560497
176 ms for BMHBinsearch result=105560497
144 ms for BMBinSearch result=105560497
310 ms for Boyer-M Mineiro 13 result=105560497
60 ms for MB Instr_() result=105560497
... using a 8 bytes substring [Doplicat]
583 ms for M32 InString result=105560497
158 ms for BMHBinsearch result=105560497
131 ms for BMBinSearch result=0
339 ms for Boyer-M Mineiro 13 result=105560497
60 ms for MB Instr_() result=105560497
... using a 7 bytes substring [Doplica]
575 ms for M32 InString result=105560497
176 ms for BMHBinsearch result=105560497
149 ms for BMBinSearch result=105560497
377 ms for Boyer-M Mineiro 13 result=105560497
62 ms for MB Instr_() result=105560497
... using a 6 bytes substring [Doplic]
586 ms for M32 InString result=105560497
187 ms for BMHBinsearch result=105560497
166 ms for BMBinSearch result=0
432 ms for Boyer-M Mineiro 13 result=105560497
62 ms for MB Instr_() result=105560497
**** last sub$ was [Doplic] -- hit any key ****
Hi mineiro!!
Look like you are forgetting that zero terminated strings requiere a final zero :biggrin:
Text_test db 'aba aba aba aba abax'
String_test db 'bax'
Any way there is some problem.
That just means that Test_text is db 'aba aba aba aba abaxbax', 0
Anyway, that text isn't used in mineiro's source. With my haystack & needle(s) his latest version behaves correctly now.
:biggrin:
Quote from: jj2007 on August 14, 2023, 07:37:33 AMThat just means that Test_text is db 'aba aba aba aba abaxbax', 0
Yes, somebody put a zero!
Anyway, I was trying to run mineiro test.
Quote from: HSE on August 14, 2023, 07:51:11 AMYes, somebody put a zero!
The OS is nice, sometimes ;-)
Quote from: HSE on August 14, 2023, 06:46:57 AMHi mineiro!!
Look like you are forgetting that zero terminated strings requiere a final zero :biggrin:
Text_test db 'aba aba aba aba abax'
String_test db 'bax'
Any way there is some problem.
Hello sir HSE, that don't need end of string because search for bytes, not only string, it's a binary search.
Just to things get clear.
Version below search for a 1024 random bytes inside a ~100MB random bytes.
As you can guess, string search will fail into this testcase.
.data
fail db 0,0
bm.exe
empty -19 cycles
BM13 81306365 cycles, 1 ocurrences found
BM17 90373957 cycles, 1 ocurrences found
BinSearch 127647148 cycles, 1 ocurrences found
Quote from: mineiro on August 14, 2023, 08:16:00 AMVersion below search for a 1024 random bytes inside a ~100MB random bytes.
As you can guess, string search will fail into this testcase.
That is a horrible test :biggrin: In a test you must remove "luck".
:thumbsup: but enjoy your game.
Quote from: HSE on August 14, 2023, 08:25:28 AMThat is a horrible test :biggrin: In a test you must remove "luck".
:thumbsup: but enjoy your game.
Yes, maybe instead of using GlobalAlloc use VirtualAlloc ... . So no hidden zeros can appear, and page fauls can appear into some SSE string search functions.
Note: I forgot to remove/comment GlobalFree inside BinSearch function, please, in previous example, comment that line.
Just for sure, I'm attaching a version without that line.