The MASM Forum

Projects => MASM32 => Topic started by: prino on October 31, 2020, 11:58:41 PM

Title: Problem with BMBinSearch
Post by: prino on October 31, 2020, 11:58:41 PM
I'm using this code in a Virtual Pascal program, unaltered, except for

- adding an initial '@' to the labels,
- changing the two @F/@@ pairs into @F1 & @F2,
- doing a rep stosd for 257 elements of the shift_table below

and just cannot get it to work, but only for some strings. My code?

function BMBinSearch(startpos: longint;
            lpsource: pointer;
            srcLngth: longint;
            lpSubStr: pointer;
            subLngth: longint): longint; assembler; {&uses ebx,esi,edi} {&frame+}

var cval       : longint;
var shift_table: array [0..256] of longint;

asm
code from BMBinSearch
end;

const _bigbuf  = 16777216;  {Use big buffers - less I/O                }

var ifile: file;

var ibuf : pointer;
var i    : longint;
var r    : longint;

const srch: string ='{Z+';

begin
  getmem(ibuf, _bigbuf);

  assign(ifile, 'd:\01-lift\01-data\lift.dat');   // File in liftdat.rar @ https://goo.gl/ZN3XAB
  reset(ifile, 1);

  blockread(ifile, ibuf^, _bigbuf, r);
  close(ifile);

  i:= BMBinSearch(0, ibuf, r, @srch[1], length(srch));
asm int 3;end;
end.

And it basically refuses to find the '{Z+' string, pointing me to a '{Z-' one, and a '{Z-' one somewhere in the middle of the file - it contains many of them...

Should I make more changes to cater for C vs Pascal differences?
Title: Re: Problem with BMBinSearch
Post by: jj2007 on November 01, 2020, 12:53:46 AM
See Instr timings: Boyer-Moore not working (http://masm32.com/board/index.php?topic=2998.0). If you desperately need the function, I could prepare a DLL that does the job with Instr_(FAST, ...) (http://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1153). Is that still your Converting HLL to assembler (http://masm32.com/board/index.php?topic=3460.msg44345#msg44345) project?
Quote
                - the FAST option is typically about twice as fast as CRT strstr, but 3..4 times as fast when used with
                string arrays (Intel Core i5 timings for counting a rare word in a file with 800 MB, 6 Mio lines):
                                232 ms                    for fast Instr_
                                795 ms for "normal" Instr_
                                999 ms for Masm32 InString
                                929 ms for CRT strstr
                - using FAST, binary search in haystacks containing zeros is possible by assigning the buffer size to edx:
                        mov edx, LastFileSize            ; any info on length of buffer can be used with edx