The MASM Forum

Projects => Rarely Used Projects => RosAsm => Topic started by: guga on June 23, 2014, 08:15:35 AM

Title: Binary Search
Post by: guga on June 23, 2014, 08:15:35 AM
After searching for a better sort or binary scan algorithm i came up here

http://eigenjoy.com/2011/09/09/binary-search-revisited
http://eigenjoy.com/2011/01/21/worlds-fastest-binary-search/

Is this works ? If it does how can this code be translated to masm/rosasm?
Also, i´m not sure if i figure it out what this algo does....Is it a binary search or a sorting algo?




Binary Search Revisited
By Matt Pulver | Published: September 9, 2011

As ubiquitous as it is, the standard binary search algorithm can still be further optimized by using bit operations to iterate through its sorted list, in place of arithmetic. Admittedly, this is primarily an academic discussion, since the code improvement does not decrease the logarithmic complexity of the standard algorithm. Nevertheless, a well-developed programming intuition should by default implement (or at a minimum consider) a solution similar to the one presented here, prior to "the obvious" standard arithmetic solution.

Here's an example. We want to find the largest index i such that haystack[i] <= needle. Needle = 15, and haystack is a sorted list of the first 8 prime numbers, indexed by binary numbers:

000     2
001     3
010     5
011     7
100     11
101     13
110     17
111     19

Note first that the haystack index requires no more than 3 bits. Therefore we start with the highest order bit set: b=100. (This font denotes binary numbers here.) Is haystack[100] <= 15? Yes, 11<=15. Observe this means that the first bit of the index we are looking for is 1. We look at the next bit. Is haystack[110] <= 15? No, 17 <= 15 is false. This means the 2nd bit is 0. Finally, is haystack[101] <= 15? Yes, 13 <= 15. Therefore the index we are looking for is 101.

In essence, the main loop to find the index i is simply:

    for( ; b ; b >>= 1 )
        if( haystack[i|b] <= needle ) i |= b;

where b is the value with 1 bit set, that began with b=100 and i=0.

It is the natural structure of the bits within the index variable that automatically tracks both the upper and lower bounds of the search window for each iteration. Compare this with the less efficient and more verbose arithmetic performed in the main loops of standard binary search algorithms.

The full C++ code for the improved binary search algorithm fbsearch() is:

// Binary search revisited.

// Define only one of these:
// #define SETBIT_FAST
// #define SETBIT_FASTER
#define SETBIT_FASTEST

#ifdef SETBIT_FAST
#include <math.h>
#endif

// Return 1 << log_2(list_size-1), or 0 if list_size == 1.
// This sets the initial value of b in fbsearch().
inline unsigned init_bit( unsigned list_size )
{
#ifdef SETBIT_FAST
    return list_size == 1 ? 0 :
        1 << int( log(list_size-1) / M_LN2 );
#endif
#ifdef SETBIT_FASTER
    return list_size == 1 ? 0 :
        1 << ( sizeof(unsigned) << 3 ) - 1
             - __builtin_clz(list_size-1);
#endif
#ifdef SETBIT_FASTEST
    unsigned b;
    __asm__ ( "decl %%eax;"
              "je DONE;"
              "bsrl %%eax, %%ecx;" // BSR - Bit Scan Reverse (386+)
              "movl $1, %%eax;"
              "shll %%cl, %%eax;"
              "DONE:" : "=a" (b) : "a" (list_size)
    );
    return b;
#endif
}

// Return the greatest unsigned i where haystack[i] <= needle.
// If i does not exist (haystack is empty, or needle < haystack[0])
// then return unsigned(-1). T can be any type for which the binary
// operator <= is defined.
template <typename T>
unsigned fbsearch( const T haystack[], unsigned haystack_size,
                   const T& needle )
{
    if( haystack_size == 0 ) return unsigned(-1);
    unsigned i = 0;
    for( unsigned b = init_bit(haystack_size) ; b ; b >>= 1 )
    {
        unsigned j = i | b;
        if( haystack_size <= j ) continue;
        if( haystack[j] <= needle ) i = j;
        else
        {
            for( b >>= 1 ; b ; b >>= 1 )
                if( haystack[i|b] <= needle ) i |= b;
            break;
        }
    }
    return i || *haystack <= needle ? i : unsigned(-1);
}


And..why it is returning 5 ?????

int main()
{
    const int sorted_list[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23 };
    const unsigned list_size = sizeof(sorted_list)/sizeof(int);
    int needle = 15;
    cout << "fbsearch(sorted_list,"<<list_size<<','<<needle<<") = "
         << fbsearch(sorted_list,list_size,needle) << endl;
    // fbsearch(sorted_list,9,15) = 5
    return 0;
}



The only part i understood (well..kind of  :greensml:) was the nearest power of 2 that can be ported as:

Proc GetClosestPowerofTwo:
    Arguments @Value

    mov eax D@Value
    dec eax | jz L1>
    mov ecx 32
    bsr eax ecx
    shl eax cl
    L1:
EndP

The above code is a replacement of
;;


unsigned long upper_power_of_two(unsigned long v)
{
    v--;
    v |= v >> 1;
    v |= v >> 2;
    v |= v >> 4;
    v |= v >> 8;
    v |= v >> 16;
    v++;
    return v;

}
In RosAsm it is something like:


    sub eax 1 | mov D@Value eax
    mov eax D@Value | sar eax 1 | or eax D@Value | mov D@Value eax
    mov eax D@Value | sar eax 2 | or eax D@Value | mov D@Value eax
    mov eax D@Value | sar eax 4 | or eax D@Value | mov D@Value eax
    mov eax D@Value | sar eax 8 | or eax D@Value | mov D@Value eax
    mov eax D@Value | sar eax 16 | or eax D@Value | mov D@Value eax
    mov eax D@Value | add eax 1
    sar eax 1
;;



So...about the fbsearch function, what does "needle" means ? and what namespace he is refering to A variable a char etc ?
Title: Re: Binary Search
Post by: anta40 on June 23, 2014, 09:18:20 AM
needle means the element you want to find
(hint: finding a needle in a haystack)

not sure about the "a" variable, though
Title: Re: Binary Search
Post by: guga on June 23, 2014, 01:47:39 PM
OK, i suceeded to port and optimize a little. But the results are not fully tested. So far, the algo seems fast, but the resulting position are not accurated.


[List: D$ 2, 3, 5, 7, 11, 13, 17 ,19, 23]
[ValuetoFInd: D$ 5]
        call BinarySearch List, 9, ValuetoFInd



Proc BinarySearch:
    Arguments @Array, @ArrayLen, @ValuetoFind
    Local @iCounter
    Uses ebx, edi, ecx, esi

    mov eax 0-1
    On D@ArrayLen = 0, ExitP

    mov D@iCounter 0
    ; calculate the closest power of 2 (log(X-1))/log(2)
    mov esi D@ArrayLen
    dec esi | jz L1>
        mov ecx 32
        bsr esi ecx
        shl esi cl
    L1:

    mov edi D@ValuetoFind
    mov ecx D@Array
    mov ebx esi

    ...If ebx <> 0
        .Do
            mov eax D$ecx+esi*4
            ..If eax <= D$edi
                mov D@iCounter esi
            ..Else
                shr ebx 1 | jz L1>>
                L2:
                    mov eax D@iCOunter | or eax ebx
                    mov eax D$ecx+eax*4
                    If eax <= D$edi
                        mov eax D@iCOunter | or eax ebx | mov D@iCounter eax
                    End_If
                    shr ebx 1 | jz L1>>
                    jmp L2<
            ..End_If
            shr ebx 1
            mov esi D@iCounter | or esi ebx

        .Loop_Until ebx = 0
L1:
    ...End_If   
;;
    ..If D@iCounter = 0
        mov eax D@Array
        mov ecx D@ValuetoFind
        mov edx D$eax
        If edx > D$ecx
            mov eax 0-1
        Else
            mov eax D@iCounter
        End_If
    ..Else
        mov eax D@iCounter
    ..End_If
;;
    mov eax D@Array
    mov ecx D@ValuetoFind
    mov edx D$eax
    .If_Or D@iCounter = 0, edx <= D$ecx
        mov eax D@iCounter
    .Else
        mov eax 0-1
    .End_If

EndP


The problem seems that either he goes to the proper position on return, or it goes back 1 position

For example, when using
[List: D$ 2, 3, 5, 7, 11, 13, 17 ,19, 23]
    [ValuetoFInd: D$ 17]
call BinarySearch List, 9, ValuetoFInd


He found value 17, but it assumes it is at position 5, (falling the number 13), insteas at position 6 (when it is located the value 17)