News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

ArrayIndex timings

Started by jj2007, February 06, 2019, 01:12:52 AM

Previous topic - Next topic

guga

Yes, ecx is the size of the table.

I´m testing it and so far, it is inside the range except those issues i told before.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

aw27

Imagine table size is 256 bytes.
You divide ecx by 2, next you multiply by 4, you get 512.
Since esi points to the table base how:
lea eax, [esi+4*ecx] is in range? How lea eax, [esi+512] is inside the table.

guga

A table of 256 bytes, means you have 64 dwords and not 512. (256/4), thus, the total size (amount of elements on the table is 64 and not 256)

Image esi pointing to the address 0401000 (MyTable).  Esi will never change. It´s a fixed address

lea load the effective adress of that DWORD and not a byte.

ecx = 256 bytes = 64 DWORDS

when you divide the size by 2 the resultant value (in DWORDS) are 32.

So, lea [esi+4*ecx] means. esi+4*32 (dwords). So, esi will points to the start of MyTable plus 128 bytes and not 512.




Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

The Size on the code is the total amount of elements on the array and not the total size in bytes
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TBRANSO1

Quote from: jj2007 on February 06, 2019, 07:04:58 PM
Quote from: TBRANSO1 on February 06, 2019, 05:00:50 PMhaving played around with data structures and binary search algos in HLL, that one from Guga on the modern 6 core 12 threaded each CPU is ridiculously fast. damn!

Post the testbed, please.

I will do it for it, give me time to review your work so I can replicated it in these languages: Ruby, Python, Java and Crystal Ruby (a C-written, statically type, completely optimized, multi-threaded, compiled version of Ruby, extremely fast, sometimes faster than C, but just as fast as C).  But you know, it's not going to be contest for Ruby, Python and Java... I am going to guess that they could get close to double-digit seconds range for sequences of over 1m elements... not milliseconds, the only one to be somewhat competitive will be Crystal Ruby.

aw27

Quote
FoundDwords 76 in position 20. True position = 20
FoundDwords 314 in position 91. True position = 91
FoundDwords 2 in position 2. True position = 1
FoundDwords 418 in position 120. True position = 119
FoundDwords 998 in position 255. True position = 255
FoundDwords 684 in position 185. True position = 185
FoundDwords 173 in position 51. True position = 50
FoundDwords 447 in position 133. True position = 133
FoundDwords 205 in position 60. True position = 60
FoundDwords 538 in position 157. True position = 157
FoundDwords 39 in position 11. True position = 11
FoundDwords 516 in position 148. True position = 149
FoundDwords 950 in position 244. True position = 244
FoundDwords 465 in position 136. True position = 135
FoundDwords 27 in position 6. True position = 6
FoundDwords 38 in position 9. True position = 9
FoundDwords 521 in position 152. True position = 152
FoundDwords 319 in position 92. True position = 92
FoundDwords 135 in position 37. True position = 37
FoundDwords 889 in position 230. True position = 230
FoundDwords 313 in position 90. True position = 90
FoundDwords 876 in position 226. True position = 226
FoundDwords 366 in position 108. True position = 108
FoundDwords 303 in position 87. True position = 87
FoundDwords 629 in position 175. True position = 175
FoundDwords 245 in position 70. True position = 70
FoundDwords 344 in position 100. True position = 101
FoundDwords 153 in position 40. True position = 40
FoundDwords 800 in position 211. True position = 211
FoundDwords 521 in position 152. True position = 152
FoundDwords 533 in position 155. True position = 155
FoundDwords 257 in position 76. True position = 76
FoundDwords 240 in position 69. True position = 69
FoundDwords 247 in position 72. True position = 73
FoundDwords 108 in position 31. True position = 31
FoundDwords 303 in position 87. True position = 87
FoundDwords 245 in position 70. True position = 71
FoundDwords 653 in position 179. True position = 179
FoundDwords 27 in position 7. True position = 6
FoundDwords 26 in position 6. True position = 5
FoundDwords 333 in position 95. True position = 95
FoundDwords 489 in position 145. True position = 145
FoundDwords 906 in position 232. True position = 232
FoundDwords 650 in position 177. True position = 177
FoundDwords 833 in position 217. True position = 217
FoundDwords 444 in position 130. True position = 131
FoundDwords 341 in position 99. True position = 99
FoundDwords 861 in position 223. True position = 223
FoundDwords 238 in position 67. True position = 67
FoundDwords 479 in position 141. True position = 140
FoundDwords 516 in position 148. True position = 150
FoundDwords 653 in position 179. True position = 179
FoundDwords 924 in position 239. True position = 239
FoundDwords 511 in position 147. True position = 147
FoundDwords 584 in position 167. True position = 167
FoundDwords 418 in position 120. True position = 119
FoundDwords 398 in position 116. True position = 116
FoundDwords 886 in position 230. True position = 229
FoundDwords 213 in position 61. True position = 61
FoundDwords 96 in position 25. True position = 25
FoundDwords 164 in position 45. True position = 45
FoundDwords 653 in position 178. True position = 178
FoundDwords 29 in position 7. True position = 7
FoundDwords 611 in position 170. True position = 170
FoundDwords 684 in position 185. True position = 185
FoundDwords 478 in position 139. True position = 139
FoundDwords 175 in position 51. True position = 51
FoundDwords 862 in position 224. True position = 224
FoundDwords 423 in position 121. True position = 121
FoundDwords 908 in position 234. True position = 234
FoundDwords 969 in position 247. True position = 247
FoundDwords 516 in position 148. True position = 149
FoundDwords 84 in position 23. True position = 22
FoundDwords 611 in position 170. True position = 170
FoundDwords 51 in position 12. True position = 12
FoundDwords 245 in position 70. True position = 70
FoundDwords 975 in position 248. True position = 248
FoundDwords 478 in position 139. True position = 139
FoundDwords 165 in position 46. True position = 46
FoundDwords 516 in position 148. True position = 149
FoundDwords 788 in position 207. True position = 207
FoundDwords 636 in position 176. True position = 176
FoundDwords 541 in position 158. True position = 158
FoundDwords 772 in position 200. True position = 200
FoundDwords 58 in position 15. True position = 14
FoundDwords 118 in position 33. True position = 33
FoundDwords 551 in position 160. True position = 160
FoundDwords 247 in position 72. True position = 72
FoundDwords 365 in position 107. True position = 107
FoundDwords 247 in position 74. True position = 73
FoundDwords 983 in position 252. True position = 252
FoundDwords 128 in position 34. True position = 34
FoundDwords 335 in position 97. True position = 97
FoundDwords 418 in position 120. True position = 120
FoundDwords 443 in position 129. True position = 129
FoundDwords 473 in position 138. True position = 137
FoundDwords 629 in position 175. True position = 175
FoundDwords 341 in position 98. True position = 98
FoundDwords 435 in position 127. True position = 126
FoundDwords 288 in position 81. True position = 81

I tested 100 times in an array of 256 dwords and your algo appears to work. Yes there are a small number of values that do not match, it needs some investigation (could be due to duplication, I have not forced unique values).
It may also be possible that larger arrays will not diverge significantly in time because maximum number of iterations is about log2(sizeofarray)

jj2007

Quote from: AW on February 07, 2019, 03:08:00 AMI tested 100 times in an array of 256 dwords and your algo appears to work.

The testbed checked the values. The only problem here is that it fails for certain table sizes because of the way the shr ecx, 1 is being used. I have little time right now, but I keep working on a variant that works for all sizes but it mysteriously a factor 8 slower. Oh well...

aw27

I have removed the duplicates caused by the random generation and retested, so duplicates where the reason for mismatches when array sizes are 2^n in size.
When array sizes are not 2^n (for example 3000) in size there are a lot of issues.
The algorithm I provided earlier for doubles does not have those issues though.
I attach the sources, C and ASM, used for testing with doubles using my previous algo and with dwords using this current algo.

TimoVJL

I tested that fast_upper_bound3() C version with primes. Still have to check if it works correctly.
int fast_upper_bound3(int *vec, int value, int size)
{
    size_t low = 0;
    while (size > 0) {
        size_t half = size / 2;
        size_t other_half = size - half;
        size_t probe = low + half;
        size_t other_low = low + other_half;
        int v = vec[probe];
        size = half;
        //low = v >= value ? other_low : low;
        low = v >= value ? low : other_low;
    }
    return low;
}
Result: Time 23 ms 12800 499200 loops
jj2007 example gave: 27 ms for 499200 searches with 0 mismatches, table elements=12800

msvc 2010 Time 28 ms 12800 499200 loops
msvc 2017 Time 29 ms 12800 499200 loops
pcc32 Time 39 ms 12800 499200 loops
May the source be with you

LordAdef

Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz

Timings for ArrayIndex():
5653 µs for 500000 searches with 0 mismatches, table elements=100
12 ms for 500000 searches with 0 mismatches, table elements=200
15 ms for 500000 searches with 0 mismatches, table elements=400
18 ms for 500000 searches with 0 mismatches, table elements=800
13 ms for 499200 searches with 0 mismatches, table elements=1600
13 ms for 499200 searches with 0 mismatches, table elements=3200
13 ms for 499200 searches with 0 mismatches, table elements=6400
13 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
22 ms for 500000 searches with 0 mismatches, table elements=100
38 ms for 500000 searches with 0 mismatches, table elements=200
66 ms for 500000 searches with 0 mismatches, table elements=400
135 ms for 500000 searches with 0 mismatches, table elements=800
257 ms for 499200 searches with 0 mismatches, table elements=1600
480 ms for 499200 searches with 0 mismatches, table elements=3200
1240 ms for 499200 searches with 0 mismatches, table elements=6400
1887 ms for 499200 searches with 0 mismatches, table elements=12800
--- hit any key ---

guga

Tks again, AW. But, I´m not sure if i ported it correctly to masmbasic, but, if the porting is correct, the timings are slower compared to JJ´s

60   bytes for BinSearchJ
97   bytes for BinSearchG
0   1.12482723
1   1.43416376
2   1.58823120
3   1.64153149
4   2.28456212
5   3.36804147
6   3.37144616
...
253   99.2858763
254   99.4469785
255   99.6651549
Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz
2000000 random searches performed in 175 ms - repnz scasd

BinSearch, mismatches not allowed
2000000 random searches performed in 144 ms
2000000 random searches performed in 134 ms
2000000 random searches performed in 128 ms
2000000 random searches performed in 134 ms
2000000 random searches performed in 134 ms
2000000 random searches performed in 129 ms
2000000 random searches performed in 127 ms
2000000 random searches performed in 129 ms

BinSearch, mismatches allowed
2000000 random searches performed in 160 ms
2000000 random searches performed in 141 ms
2000000 random searches performed in 138 ms
2000000 random searches performed in 149 ms
2000000 random searches performed in 140 ms
2000000 random searches performed in 140 ms
2000000 random searches performed in 147 ms
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

aw27

This shall work for arrays of any size.

I have used 2 testing methods and used an array of 12800 elements tested 500,000 times.
The first method is what appears to be the Masm Basic way: Values to test are not random, belong to the array and increase by one in each iteration.
The second method uses random values within the range of the array but may not belong to the array (this corresponds to the requirement, if I understood well).

I am using inline assembly language to simulate what MASM macros do.

Method: No Random.  Table elements: 12800. Elapsed time: 18 milliseconds
Method: Random.  Table elements: 12800. Elapsed time: 46 milliseconds



#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <Windows.h>

#define CALCULRANDOM 0
#define NUMTESTS 500000
#define NUMELEMENTS 12800
#define MAXIMUMDWORD 50000
#define MAXUPPERMARGIN 5

unsigned int typedef DWORD;

DWORD dwordArray[NUMELEMENTS];

//******************* Build random array ************************
static int comparedwords(const void * a, const void * b)
{
if (*(DWORD*)a > *(DWORD*)b) return 1;
else if (*(DWORD*)a < *(DWORD*)b) return -1;
else return 0;
}

void genDwords()
{
int i;
DWORD dupsFound;
srand(time(NULL));

for (i = 0; i < NUMELEMENTS; i++)
dwordArray[i] = rand() % MAXIMUMDWORD;

do // Remove duplicates
{
qsort(dwordArray, NUMELEMENTS, sizeof(DWORD), comparedwords);
dupsFound = 0;
for (i = 0; i < NUMELEMENTS - 1; i++)
{
if (dwordArray[i] == dwordArray[i + 1])
{
dwordArray[i + 1] = rand() % MAXIMUMDWORD;
dupsFound = 1;
}
}
} while (dupsFound);
}
// ********************************************************

/* Not used
_inline int binarySearchDword(DWORD sortedArray[], int first, int last, DWORD key)
{

while (first <= last) {
int mid = first + (last - first) / 2;

if (key == sortedArray[mid])
return mid;
else if (key < sortedArray[mid])
last = mid - 1;
else {
if (key < sortedArray[mid + 1])
return mid;
else
first = mid + 1;

}
}
return -1;
}
*/

int main()
{
int selectedPosition=0;
DWORD selectDword;
int retpos;
int testNum;
LARGE_INTEGER frequency;
LARGE_INTEGER t1, t2;
double elapsedTime;

genDwords();  // Generate random array of unique dwords

QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&t1);
for (testNum = 0; testNum < NUMTESTS; testNum++)
{
#if CALCULRANDOM==0
selectDword = dwordArray[selectedPosition];

#elif CALCULRANDOM==1
selectedPosition = rand() % NUMELEMENTS;
selectDword = dwordArray[selectedPosition];
selectDword += rand() % MAXUPPERMARGIN;
if (selectDword >= dwordArray[selectedPosition + 1])
selectDword = dwordArray[selectedPosition];
#endif
_asm
{
push esi
push edi
push ebx
mov esi, NUMELEMENTS - 1
mov ecx, 0
mov ebx, selectDword
startLoop:
mov eax, esi
sub eax, ecx
cdq
sub eax, edx
lea edx, dwordArray
sar eax, 1
add eax, ecx
mov edx, dword ptr[edx + eax * 4]
cmp ebx, edx
je short EndSearch
jae short checkElse
lea esi, [eax - 1]
jmp short EndLoop
checkElse:
lea ecx, dwordArray
cmp ebx, [ecx + eax * 4 + 4]
jb short EndSearch
lea ecx, [eax + 1]
EndLoop:
cmp ecx, esi
jle short startLoop
or eax, -1
EndSearch:
mov retpos, eax
pop ebx
pop edi
pop esi
}

//retpos = binarySearchDword(dwordArray, 0, NUMELEMENTS - 1, selectDword);
if (retpos != selectedPosition)
printf("****************FOUND MISMATCH ********************\n");
#if CALCULRANDOM==0
selectedPosition = ++selectedPosition % NUMELEMENTS;
#endif
}
QueryPerformanceCounter(&t2);
elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
printf("Method: %s.  Table elements: %d. Elapsed time: %0.0f milliseconds\n", (CALCULRANDOM)?"Random":"No Random", NUMELEMENTS, elapsedTime);
return 0;
}

jj2007

There is also the CRT bsearch: mov eax, rv(crt_bsearch, addr tmpDW, addr table(0), 256, DWORD, addr cmpFunction)

It would be helpful if alternative algos were posted as somealgo proc uses esi edi pTable, matchtofind, sizeoftable

guga

Hi Aw

Tks:) Looks good, but a few problems.

1) Only works with Dwords and not Real8
2) Only works with positive numbers
3) It´s returning -1 whenever the value is outside the ranger when the necessary was that, on this situation it look for it´s extremes. Ex: Array  = 17,35, 44, 66, 99, 120 . If value to find = 5, it should return 0 in eax (The starting index/pos). If the value to find is 200, it should return 5 (The last index/pos)


Jj
QuoteIt would be helpful if alternative algos were posted as somealgo proc uses esi edi pTable, matchtofind, sizeoftable

Totally agree :t
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

LiaoMi

QuoteHere are the names of search implementations:

binary_search_std: direct call to std::lower_bound
binary_search_simple: basic binary search with branches
binary_search_branchless: branchless binary search
binary_search_branchless_UR: branchless binary search – fully unrolled
linearX_search_sse: linear search with break (SSE)
linear_search_sse: counting linear search (SSE)
linear_search_sse_UR: counting linear search (SSE) – fully unrolled
linear_search_avx: counting linear search (AVX)
linear_search_avx_UR: counting linear search (AVX) – fully unrolled



https://dirtyhandscoding.wordpress.com/2017/08/25/performance-comparison-linear-search-vs-binary-search/

+

https://stackoverflow.com/questions/18971401/sparse-array-compression-using-simd-avx2