The MASM Forum

General => The Laboratory => Topic started by: daydreamer on February 11, 2025, 04:23:36 AM

Title: different compares clock cycle test ???
Post by: daydreamer on February 11, 2025, 04:23:36 AM
Hi
what speed difference are there between test,cmp,fpu compare,SSE UCOMISS,SSE2 UCOMISD ?
Title: Re: different compares clock cycle test ???
Post by: Villuy on February 11, 2025, 07:12:12 AM
Count for each:

.486p
.model flat, stdcall
option casemap:none

include C:\masm32\include\kernel32.inc
includelib C:\masm32\lib\kernel32.lib

.data?

PerformanceFrequency dword ?
dword ?
PerformanceCount1 dword ?
dword ?
PerformanceCount2 dword ?
dword ?

TimeSpent dword ?

.code

Start:

invoke QueryPerformanceFrequency, offset PerformanceFrequency ; Ticks per second frequency
invoke QueryPerformanceCounter, offset PerformanceCount1 ; Execution start time
mov ecx, 1000000000

CountTime:

test eax, 5
loop CountTime

invoke QueryPerformanceCounter, offset PerformanceCount2 ; Execution end time
mov eax, PerformanceCount2
sub eax, PerformanceCount1 ; Number of ticks elapsed
mov edx, 1000000 ; Converting ticks to microseconds
mul edx
div PerformanceFrequency
mov TimeSpent, eax ; Execution time in microseconds

invoke ExitProcess, 0

end Start
Title: Re: different compares clock cycle test ???
Post by: NoCforMe on February 11, 2025, 08:53:52 AM
I think the OP was asking for timings of these instructions, not code to measure them.
Title: Re: different compares clock cycle test ???
Post by: zedd151 on February 11, 2025, 09:02:33 AM
Quote from: NoCforMe on February 11, 2025, 08:53:52 AMI think the OP was asking for timings of these instructions, not code to measure them.
I think that was a hint for daydreamer to time them for himself. I think...  :cool:
I, of course, could be mistaken. That's happened once before.  :tongue:
Title: Re: different compares clock cycle test ???
Post by: NoCforMe on February 11, 2025, 09:11:54 AM
Why should you have to write a program to find these timings when they're probably published somewhere already? It's not like they're some kind of Top Sekrit info.
Title: Re: different compares clock cycle test ???
Post by: zedd151 on February 11, 2025, 09:33:23 AM
Quote from: NoCforMe on February 11, 2025, 09:11:54 AMWhy should you have to write a program to find these timings when they're probably published somewhere already? It's not like they're some kind of Top Sekrit info.
Because timings can differ wildly from one processor to another, and definitely between AMD vs. Intel for some instructions. Same holds true for cycle counts as well, judging by some of the threads in this very board (The Laboratory).
Title: Re: different compares clock cycle test ???
Post by: sinsi on February 11, 2025, 11:23:30 AM
Have a look at masm32\macros\timers.asm
You can count clock cycles and execution time with the macros.
They aren't in the MASM64 package though, but I think MichaelW ported them to 64-bit somewhere.
Title: Re: different compares clock cycle test ???
Post by: NoCforMe on February 11, 2025, 01:30:16 PM
Macros, schmacros.
The code given above by Villuy shows how simple this is:

Title: Re: different compares clock cycle test ???
Post by: sinsi on February 11, 2025, 02:11:38 PM
counter_begin 100000,HIGH_PRIORITY_CLASS
;your code here
counter_end
;EAX has cycle count
:rolleyes:
Title: Re: different compares clock cycle test ???
Post by: NoCforMe on February 11, 2025, 03:27:36 PM
To answer the OP's question without writing code, you might want to check this document of Agner Fog's (https://www.agner.org/optimize/instruction_tables.pdf), which gives the following info for each instruction for many different processors:


As you can see, determining execution time for instructions is no simple thing. However, these tables at least offer the chance to compare instructions to see which ones may be relatively faster.
Of course, as they say, YMMV.
Title: Re: different compares clock cycle test ???
Post by: Villuy on February 11, 2025, 05:50:31 PM
In fact, if instruction is needed in this place, then it is needed. No point in thinking about its price. But in broad sense, optimization is so complex and unknowable that there is nothing to rely on except empirical method for each concrete case.
Title: Re: different compares clock cycle test ???
Post by: daydreamer on February 12, 2025, 06:15:21 AM
Real4 compares that can be done with integer 32 bit cmp is +inf ,-inf and zero 0.0 = zero 0
Real8 compares similar but with 64 bit integer cmp
Are these faster ?
Ucomiss faster than fpu compare ?
Title: Re: different compares clock cycle test ???
Post by: NoCforMe on February 12, 2025, 09:41:57 AM
Quote from: daydreamer on February 12, 2025, 06:15:21 AMReal4 compares that can be done with integer 32 bit cmp is +inf ,-inf and zero 0.0 = zero 0
Real8 compares similar but with 64 bit integer cmp
Are these faster ?
Ucomiss faster than fpu compare ?

Look them up in that document I linked in my reply above (https://www.agner.org/optimize/instruction_tables.pdf).
They're all there. See for yourself.
Title: Re: different compares clock cycle test ???
Post by: daydreamer on February 13, 2025, 07:43:34 AM
What's happened to this forum ?,nobody wants to test run code anymore ? :(
I have a comparison that might not been tested yet :
A loop with many scalar compares with conditional jumps vs packed compares ???
Title: Re: different compares clock cycle test ???
Post by: NoCforMe on February 13, 2025, 08:03:24 AM
Well, go ahead and test it then.
You have access to all the tools you need, including macros if you want to go that route.
Report back to us with your results.
Title: Re: different compares clock cycle test ???
Post by: sinsi on February 13, 2025, 09:08:55 AM
Quote from: daydreamer on February 13, 2025, 07:43:34 AMWhat's happened to this forum ?,nobody wants to test run code anymore ? :(
I have a comparison that might not been tested yet :
A loop with many scalar compares with conditional jumps vs packed compares ???

The skeleton code I wrote earlier is all you need, just add the code you want to test.
Title: Re: different compares clock cycle test ???
Post by: zedd151 on February 13, 2025, 09:32:36 AM
Quote from: daydreamer on February 13, 2025, 07:43:34 AMWhat's happened to this forum ?,nobody wants to test run code anymore ? :(
It's usually the OP that supplies the testing method. This will ensure that all testers will use the same testbed (I.e., testing methods), for better comparison between processors where the test is being conducted, and even to test for any differences between OS's etc. that are testing the same function or algorithm.
Title: Re: different compares clock cycle test ???
Post by: NoCforMe on February 13, 2025, 12:06:45 PM
We look forward to your test results, hopefully soon.
Title: Re: different compares clock cycle test ???
Post by: daydreamer on February 16, 2025, 09:35:19 PM
Scalar vs packed SSE compare

// primesx.cpp

#include "pch.h"
#include <iostream>
using namespace std;
float zero = 0.0;
int pflag = 0;
alignas(16) float flut[]{ 2.0,3.0,5.0,7.0,11.0,13.0,17.0,19.0,0.0,0.0,0.0 };
alignas(16) float arr[]{ 0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 };
char lut[]{ 0, 0, 2, 3, 0, 5, 0, 7, 0, 0,0,11,0,13,0,0,0,0,0,0 };
int main()
{
int i, j=3;
float f = 3.0;
float fresult = 0;
    cout << "Primesx\n";
for (i = 0; i < 14; i++) {
cout << i << " ";
if (lut[i] != 0) cout << "prime " << (int)lut[i] << " ";
cout << f << " ";
//f = f + 1.0;
}
cout << f << " ";
cout << "\n\n\n\n";
for (j = 1; j < 14; j=j+1) {
f = (float)j; //test all floats
_asm {
push ebx

mov ecx, 14
mov ebx, 1
; lea ebx, [ebx * 4]
lea ebx, [flut + ebx * 4]
; lea ebx, flut
L2 :
movss xmm0, f
movss xmm1, [ebx]
; subss xmm0, xmm1
ucomiss xmm0, xmm1
jne L1; found prime
ja L4s;ja jump if above,jb jump if below
mov eax, 1
mov pflag, eax
movss xmm0, [ebx]
movss fresult, xmm0
jmp l3;have found prime, jump out of loop
L4s:
mov eax,0ffffffffh
jmp L3
L1 :
xorps xmm0, xmm0
mov eax,0
mov pflag,eax
; movss fresult, xmm0
add ebx, 4
dec ecx
jne L2
L3:
pop ebx


}
fresult = fresult * pflag;
cout << fresult << " ";
}//j
cout << "\n";
for (j = 2; j < 20; j++) {
f = (float)j;

_asm {
push ebx
lea ebx, flut
movss xmm0, f
shufps xmm0, xmm0, 0
movups xmm1, [ebx]
movaps xmm3, xmm1
CMPEQPS xmm1, xmm0
pand xmm1, xmm3
movaps xmm7, xmm1
add ebx, 16
movups xmm1, [ebx]
movaps xmm3, xmm1
CMPEQPS xmm1, xmm0
pand xmm1, xmm3
por xmm1, xmm7
movups arr, xmm1
haddps xmm1, xmm1
haddps xmm1, xmm1
movss fresult, xmm1


pop ebx
}
cout << "xmm reg float 0,float 1,float 2,float 3 : "<< arr[0] << " " << arr[1] << " " << arr[2] << " " << arr[3] << "\n";
cout << fresult << " zero = non prime\n";

}//second time j


}


Title: Re: different compares clock cycle test ???
Post by: jack on February 17, 2025, 12:30:29 AM
where is "pch.h" ?
never mind
Title: Re: different compares clock cycle test ???
Post by: NoCforMe on February 17, 2025, 06:16:19 AM
I don't see any results, only C code.
Title: Re: different compares clock cycle test ???
Post by: TimoVJL on February 17, 2025, 06:49:24 AM
Quote from: NoCforMe on February 17, 2025, 06:16:19 AMI don't see any results, only C code.
C++ code with part of assembler block.
C and C++ aren't same thing.

I don't use C++, as i don't need it, and as it isn't universal at all, as it can't share object files nor libraries.