The MASM Forum

General => The Campus => Topic started by: Lonewolff on April 12, 2018, 03:15:46 PM

Title: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 03:15:46 PM
Hi Guys,

I am trying to convert my C/C++ frame rate counter to work in MASM. But I have come across a bit of a snag. I'm not sure how you go about handling 64 bit integers.


; Framerate counter
invoke QueryPerformanceFrequency, addr TimeFrequency
invoke QueryPerformanceCounter, addr TimeEnd
;TimeElapsed.QuadPart = TimeEnd.QuadPart - TimeStart.QuadPart;
;TimeElapsed.QuadPart *= 1000000000;
;TimeElapsed.QuadPart /= TimeFrequency.QuadPart; // in nanoseconds
inc nCounter

;if (TimeElapsed.QuadPart > 1000000000)
.if 1 ; placeholder
invoke itoa, nCounter, addr szBuffer, 10
invoke SetWindowText, hWnd, addr szBuffer
mov nCounter, 0
invoke QueryPerformanceFrequency, addr TimeFrequency
invoke QueryPerformanceCounter, addr TimeStart
.endif


Here is my partially converted code. The commented lines are C/C++

If anyone could assist, that would be truly appreciated  8)
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: hutch-- on April 12, 2018, 03:31:27 PM
If the number range is within DWORD then you probably only need to access the low DWORD of the 64 bit number. I gather this is 32 bit code ?
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 03:33:08 PM
Yep 32 bit code.

Could you please give an example of how to access the low part of the DWORD?

Still getting my feet on the ground with the simple stuff. Tried a few different things but they don't compile.

Thanks again  :)
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 04:01:11 PM
I think I am a step closer, but I am on the edge of my knowledge of ASM here - LOL  :bgrin:


invoke QueryPerformanceFrequency, addr TimeFrequency
invoke QueryPerformanceCounter, addr TimeEnd

;TimeElapsed.QuadPart = TimeEnd.QuadPart - TimeStart.QuadPart;
mov eax,DWORD PTR TimeEnd[0]
sub eax,DWORD PTR TimeStart[0]
mov ecx,DWORD PTR TimeEnd[+4]
sbb ecx,DWORD PTR TimeStart[+4]
mov DWORD PTR TimeElapsed[0], eax
mov DWORD PTR TimeElapsed[+4], ecx

;TimeElapsed.QuadPart *= 1000000000;                             // Not sure what to do here
;TimeElapsed.QuadPart /= TimeFrequency.QuadPart; // Not sure what to do here
inc nCounter

;if (TimeElapsed.QuadPart > 1000000000)                           // Not sure what to do here
.if 1 ; placeholder
invoke itoa, nCounter, addr szBuffer, 10
invoke SetWindowText, hWnd, addr szBuffer
mov nCounter, 0
invoke QueryPerformanceFrequency, addr TimeFrequency
invoke QueryPerformanceCounter, addr TimeStart
.endif


If anyone can assist in helping fill in the blanks, it would be truly awesome.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on April 12, 2018, 05:48:31 PM
The simple solution:
  NanoTimer (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1171)()
  invoke Sleep, 1000
  Inkey NanoTimer$()


But if you want to roll your own, it's good to know that the FPU understands perfectly what a QWORD integer is:
include \masm32\include\masm32rt.inc

.data?
timeStart dq ?
timeEnd dq ?
timeFrequency dq ?
timeElapsed dq ?

.code
start:
  invoke QueryPerformanceFrequency, addr timeFrequency
  invoke QueryPerformanceCounter, addr timeStart
  invoke Sleep, 3000
  invoke QueryPerformanceCounter, addr timeEnd
  fild timeEnd
  fild timeStart
  fsub
  fild timeFrequency
  fdiv
  fistp timeElapsed
  inkey str$(dword ptr timeElapsed), " seconds elapsed"
  exit
end start
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on April 12, 2018, 06:07:32 PM
i know the feeling....


I copied/pasted some related stuff from my own code, without much checking. But I hope some of it may help.
It may even have some bugs, although it's actually working ok.


.data
       SchedulerMS dd 1 ; granularity for Sleep
PerfCountFreq dd 0
LastCounter dd 0
EndCounter dd 0
ElapsedCounter dd 0
tFPS dd 0
MSPerFrameR real8     0.0
SleepMS sdword 0
TargetSecPerFrame real8 16.0



In the code:

QueryPerformance... uses LONG INTEGER which is an Union, but you can deal with it straight as a dword:


....
inv QueryPerformanceFrequency, ADDR PerfCountFreq



in the game loop:
...

inv QueryPerformanceCounter, ADDR LastCounter



....


inv QueryPerformanceCounter, ADDR EndCounter
mov ecx, LastCounter
mov eax, EndCounter
sub eax, ecx
mov edx, 1000
mov ElapsedCounter, eax
mul edx

push eax
fild dword ptr [esp]
fidiv PerfCountFreq
fstp MSPerFrameR

mov eax, PerfCountFreq
cdq
div dword ptr ElapsedCounter
mov tFPS, eax

m2m LastCounter, EndCounter
[code]
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 12, 2018, 06:17:55 PM
This is a piece of timer code I use to calculate the FrameTime delta.
You only need the low 32bit part to calculate the time between screen refreshes.

FramesPerSecond = (1.0 / FrameTimeDelta )

.const
QPinteger struct
    Low32bit    dd ?
    High32bit   dd ?
QPinteger ends

float1          real4 1.0

.data?
align8
FrameTimeOld    QPinteger <?>
FrameTimeNew    QPinteger <?>

TicksPerSecondReciprocal real4 ?
FrameTimeDelta           real4 ?


.code

InitTimer proc
    invoke   QueryPerformanceCounter,addr FrameTimeOld
    invoke   QueryPerformanceFrequency,addr FrameTimeNew
    movss    xmm0,float1
    cvtsi2ss xmm1,FrameTimeNew.Low32bit
    divss    xmm0,xmm1
    movss    TicksPerSecondReciprocal,xmm0
    ret
InitTimer endp


Update_frame proc
    invoke   QueryPerformanceCounter,addr FrameTimeNew
    mov      eax,FrameTimeNew.Low32bit
    mov      ecx,eax
    sub      eax, FrameTimeOld.Low32bit
    mov      FrameTimeOld.Low32bit,ecx
    cvtsi2ss xmm0,eax
    mulss    xmm0,TicksPerSecondReciprocal
    movss    FrameTimeDelta,xmm0 ; FramesPerSecond == 1 / FrameTimeDelta
    ret
Update_frame endp


EDIT: code adjustment!
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 06:25:45 PM
Awesome, thanks for the advice.  8)

How would I compare TimeElapsed against 1000000000 to see if it is greater?

I can't use something like the following as it doesn't fit in EAX


mov eax, TimeElapsed
mov ebx, 1000000000
cmp ebx, eax
jg greater


Could you get away with just the low byte or something?

Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 06:31:52 PM
This is what I presently have but the code after the compare never gets executed.

The aim is to display the frame rate at one second intervals in the window title area.

Am I on the write track?


; Framerate counter (Work In Progress)
invoke QueryPerformanceFrequency, addr TimeFrequency
invoke QueryPerformanceCounter, addr TimeEnd
fild TimeEnd
fild TimeStart
fsub
fild TimeFrequency
fdiv
fistp TimeElapsed
 
inc nCounter

mov eax, DWORD PTR TimeElapsed[0]
mov ebx, 1000000000
cmp ebx, eax
jg skip

; ** Never gets called **
invoke itoa, nCounter, addr szBuffer, 10
invoke SetWindowText, hWnd, addr szBuffer
mov nCounter, 0
invoke QueryPerformanceFrequency, addr TimeFrequency
invoke QueryPerformanceCounter, addr TimeStart

skip:
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 12, 2018, 06:33:30 PM
I adjusted the code in my previous post.

The Update_frame proc would be something like this:


FrameCounter                dd 0
TimeCounter                 real4 0.0
FrameTimeCounter            real4 0.0
FramesPerSecond             real4 0.0


    invoke      QueryPerformanceCounter,addr FrameTimeNew
    mov         eax,FrameTimeNew.Low32bit
    mov         ecx,eax
    sub         eax,FrameTimeOld.Low32bit
    mov         FrameTimeOld.Low32bit,ecx
    cvtsi2ss    xmm0,eax
    mulss       xmm0,TicksPerSecondReciprocal
    movss       FrameTimeDelta,xmm0 ; FPS = 1 / FrameTimeDelta

    movss       xmm1,TimeCounter
    addss       xmm1,xmm0
    movss       TimeCounter,xmm1

    inc         FrameCounter
    movss       xmm1,FrameTimeCounter
    addss       xmm1,xmm0
    comiss      xmm1,FLT4(1.0)
    jb          PerSecond
    cvtsi2ss    xmm0,FrameCounter
    divss       xmm0,xmm1
    movss       FramesPerSecond,xmm0  ; update per second
    mov         FrameCounter,0       
    xorps       xmm1,xmm1
PerSecond:   
    movss       FrameTimeCounter,xmm1
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on April 12, 2018, 06:41:29 PM
Marinus, not using FPU is a personal taste or there is any performance gain? As far as I read FPU still stands nicely, right?


edit to add: the reason I'm curious is because sometimes you also use FPU, got me thinking
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 06:50:55 PM
Seem to have it working now  :icon_cool:

Needed to throw a multiplication of 1000000000 in there.


; Framerate counter (Work In Progress)
invoke QueryPerformanceFrequency, addr TimeFrequency
invoke QueryPerformanceCounter, addr TimeEnd
fild TimeEnd
fild TimeStart
fsub
fild TimeNanoSecond ; 1000000000
fmul
fild TimeFrequency
fdiv
fistp TimeElapsed

  inc nCounter

mov eax, DWORD PTR TimeElapsed[0]
mov ebx, 1000000000
cmp eax, ebx
jl skip
invoke itoa, nCounter, addr szBuffer, 10
invoke SetWindowText, hWnd, addr szBuffer
mov nCounter, 0
invoke QueryPerformanceFrequency, addr TimeFrequency
invoke QueryPerformanceCounter, addr TimeStart
skip:


I must be missing some optimisation techniques somewhere as my C++ loop (using the same render code) is 1000 FPS faster than the ASM loop.

C++ render loop is ~7000 FPS
ASM render loop is ~6000 FPS

Not a bad comparison though.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on April 12, 2018, 06:59:57 PM
one think I noticed is (as far as I know) you only need to invoke queryperformancefrequency once, outside and prior to the loop.


You will be receiving the same value all the time.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 07:03:30 PM
True. I could take out one of the calls.

But if I take out both (and place a single call prior to the loop) systems that throttle clock speed (the ones that are too smart for their own good) will get incorrect results.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on April 12, 2018, 07:05:34 PM
Quote from: Lonewolff on April 12, 2018, 06:50:55 PMI must be missing some optimisation techniques somewhere as my C++ loop (using the same render code) is 1000 FPS faster than the ASM loop.

Check where the bottleneck is... as far as the timing functions are concerned, they have low overhead, but you could, for example,
- call frequency only once before the loop (it won't change)
- if you use it inside the loop, use QueryPerformanceCounter only once (old end = new start time)
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on April 12, 2018, 07:05:56 PM
Quote from: Lonewolff on April 12, 2018, 07:03:30 PM
True. I could take out one of the calls.

But if I take out both (and place a single call prior to the loop) systems that throttle clock speed (the ones that are too smart for their own good) will get incorrect results.


thats's true too, have you benchmarked with and without it? I'm way from the computer but got curious
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 07:10:00 PM
About the same.

I think it is more the 'DX11 render cycle' that is the bottleneck.

Just had a thought though. The ASM version is built with ML and Link that is supplied with MASM32. But the C++ version is built with the versions supplied with VS2017. I wonder if that is the source of the difference.

Gonna grab something to eat and I'll report back when I build the ASM versions with the 2017 compiler.  :icon_cool:
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 12, 2018, 07:12:40 PM
Quote from: LordAdef on April 12, 2018, 06:41:29 PM
Marinus, not using FPU is a personal taste or there is any performance gain? As far as I read FPU still stands nicely, right?


edit to add: the reason I'm curious is because sometimes you also use FPU, got me thinking

Hi Alex,

When coding graphics and audio, I mainly use SIMD and not FPU because it can move more data around at greater speed.
When possible I don't mix SIMD and FPU that's why I used scalar SIMD for the timer code.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 12, 2018, 07:18:29 PM
Quote from: Lonewolff on April 12, 2018, 06:50:55 PM
I must be missing some optimisation techniques somewhere as my C++ loop (using the same render code) is 1000 FPS faster than the ASM loop.

C++ render loop is ~7000 FPS
ASM render loop is ~6000 FPS

Not a bad comparison though.


Are the message pump loops the same for ASM and C++?
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 12, 2018, 07:28:09 PM
Quote from: Lonewolff on April 12, 2018, 07:03:30 PM
True. I could take out one of the calls.

But if I take out both (and place a single call prior to the loop) systems that throttle clock speed (the ones that are too smart for their own good) will get incorrect results.

Never noticed that ( my system does throttle the clock speed )
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 07:31:44 PM
Quote from: Siekmanski on April 12, 2018, 07:18:29 PM
Are the message pump loops the same for ASM and C++?

Yep, making sure I keep the code the same so we are comparing apples with apples.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 07:40:24 PM
Just changed all of the libs to the 2017 SDK versions and the frame rate is now on par with the C++ version.

Couldn't compile with the 2017 ML.exe as it is complaining about invalid operands on a couple of my calls. Will look a bit closer to see if I am doing something wrong on that front.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 12, 2018, 07:47:24 PM
Just curious, was it the d3d11.lib ?
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 07:57:42 PM
Was already using d3d11.lib from the SDK.

Copied the others across - gdi32.Lib, kernel32.Lib, and user32.Lib.


Not sure why the new version of ML.exe doesn't like the project though. Something to do with the coinvoke macro?

Quote
error A2070:invalid instruction operands coinvoke(16): Macro Called From project.asm(228): Main Line Code
error A2070:invalid instruction operands coinvoke(16): Macro Called From project.asm(233): Main Line Code
error A2070:invalid instruction operands coinvoke(16): Macro Called From project.asm(238): Main Line Code


The corresponding lines of code;


(line 228) coinvoke d3dDevice, ID3D11Device, CreateVertexShader, addr vertexShaderData, SIZEOFvertexShaderData, NULL, addr d3dVertexShader

(line 233) coinvoke d3dDevice, ID3D11Device, CreatePixelShader, addr pixelShaderData, SIZEOFpixelShaderData, NULL, addr d3dPixelShader

(line 238) coinvoke d3dDevice, ID3D11Device, CreateInputLayout, addr inputDescP, 1, addr vertexShaderData, SIZEOFvertexShaderData, addr d3dInputLayout



[edit]
Worked it out. The new compiler doesn't like the way I am doing 'sizeof'. I'll work that out another day  :biggrin:
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 12, 2018, 08:10:21 PM
ole32.lib perhaps?

(SIZEOF vertexShaderData)
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 08:21:56 PM
Quote from: Siekmanski on April 12, 2018, 08:10:21 PM
ole32.lib perhaps?

(SIZEOF vertexShaderData)

Yeah 'SIZEOF vertexShaderData' doesn't work because the declaration is multi-line (Shader is hard coded at present)


vertexShaderData db 68,88,66,67,166,109,78,113,107,98,65,70,91,88,250,161,103,22,241,76,1,0,0,0,16,2,0,0,6,0,0,0,56,0,0,0,156,0,0,0,224,0,0,0,92,1,0,0
db 168,1,0,0,220,1,0,0,65,111,110,57,92,0,0,0,92,0,0,0,0,2,254,255,52,0,0,0,40,0,0,0,0,0,36,0,0,0,36,0,0,0,36,0,0,0,36,0,1
db 0,36,0,0,0,0,0,1,2,254,255,31,0,0,2,5,0,0,128,0,0,15,144,4, 0,0,4,0,0,3,192,0,0,255,144,0,0,228,160,0,0,228,144,1,0,0,2,0,0
db 12,192,0,0,228,144,255,255,0,0,83,72,68,82,60,0,0,0,64,0,1,0,15,0,0,0,95,0,0,3,242,16,16,0,0,0,0,0,103,0,0,4,242,32,16,0,0,0,0
db 0,1,0,0,0,54,0,0,5,242,32,16,0,0,0,0,0,70,30,16,0,0,0,0,0,62,0,0,1,83,84,65,84,116,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0
db 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 0,0,0,0,0,0,82,68,69,70,68,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,28,0,0,0,0,4,254,255,0,1,0,0,28,0,0,0,77,105,99,114,111,115,111
db 102,116,32,40,82,41,32,72,76,83,76,32,83,104,97,100,101,114,32,67,111,109,112,105,108,101,114,32,49,48,46,49,0,73,83,71,78,44,0,0,0,1,0,0,0,8,0,0,0
db 32,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,15,15,0,0,80,79,83,73,84,73,79,78,0,171,171,171,79,83,71,78,44,0,0,0,1,0,0,0,8
db 0,0,0,32,0,0,0,0,0,0,0,1,0,0,0,3,0,0,0,0,0,0,0,15,0,0,0,83,86,95,80,79,83,73,84,73,79,78,0
SIZEOFvertexShaderData EQU $-vertexShaderData


So this is how I am calculating 'sizeof' until I code a better solution.

New compiler doesn't like that very much, where the old one seems ok with it.

Project doesn't link to ole32.lib.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on April 12, 2018, 08:30:27 PM
An old problem with recent Micros**t assemblers. Try this:
mov ecx, SIZEOFpixelShaderData
coinvoke d3dDevice, ID3D11Device, CreateVertexShader, addr vertexShaderData, ecx, NULL, addr d3dVertexShader


If that doesn't work:
vertexShaderData        db 68 ....
vertexShaderDataEnd     db 0
...
mov ecx, vertexShaderDataEnd
sub ecx, vertexShaderData
coinvoke d3dDevice, ID3D11Device, CreateVertexShader, addr vertexShaderData, ecx, NULL, addr d3dVertexShader
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 12, 2018, 08:36:57 PM
Thought your data was in a structure member, than you can use (sizeof vertexShaderData)
Your solution should work too.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 12, 2018, 08:41:59 PM
Thanks JJ2007, a couple of things to try.

Nah Siekmanski, my data is all nasty and flapping about at the moment - LOL
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 12, 2018, 08:52:41 PM
-> Project doesn't link to ole32.lib.

CoInitialize and CoUninitialize do need ole32.lib, needed for COM.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: hutch-- on April 12, 2018, 08:54:01 PM
The API is simple to use.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL var1  :DWORD

    push esi
    push edi

    lea esi, var1                           ; load the address
    mov edi, 100                            ; set the counter

  @@:
    invoke QueryPerformanceCounter, esi     ; call the API
    print str$([esi]),13,10                 ; display low DWORD
    sub edi, 1                              ; decrement counter
    jnz @B                                  ; loop again if not 0

    pop edi
    pop esi

    ret

main endp


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 12, 2018, 08:58:16 PM
var1 should be QWORD size
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: nidud on April 12, 2018, 09:32:53 PM
deleted
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: hutch-- on April 12, 2018, 09:56:57 PM
You are right, it should be in 32 bit,

    LOCAL var1  :DWORD
    LOCAL dumm  :DWORD

I added it later.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: nidud on April 12, 2018, 10:21:51 PM
deleted
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: dedndave on April 13, 2018, 01:52:43 AM
you might use the LARGE_INTEGER structure, as defined in windows.inc

LARGE_INTEGER UNION
    STRUCT
      LowPart  DWORD ?
      HighPart DWORD ?
    ENDS
  QuadPart QWORD ?
LARGE_INTEGER ENDS


handy, because you can access the values as either 2 DWORDs or 1 QWORD

    LOCAL  liPerfCtr   :LARGE_INTEGER
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on April 13, 2018, 02:04:17 AM
Wait,
what Hutch was first doing is what I am doing, simply passing the first dword straight away is already what we want. we don't need any extra work unless we want the full qword.


Or I am missing something?



Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: nidud on April 13, 2018, 02:17:46 AM
deleted
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on April 13, 2018, 02:28:09 AM
But I mean,

queryperformancecounter, addr temp
mov eax, temp

that's the dword we need, right? The low dword is the first data in the union, so this should suffice
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on April 13, 2018, 02:34:39 AM
Most of the time the DWORD is enough; occasionally, you'll get overflow, though.

But what's wrong with the complete solution (http://masm32.com/board/index.php?topic=7060.msg75722#msg75722) that I posted in reply #4? Plain Masm32 8)

Btw if you need milliseconds instead of seconds, just insert a line as shown below, and adjust the unit:include \masm32\include\masm32rt.inc

.data?
timeStart dq ?
timeEnd dq ?
timeFrequency dq ?
timeElapsed dq ?

.code
start:
  invoke QueryPerformanceFrequency, addr timeFrequency
  invoke QueryPerformanceCounter, addr timeStart
  invoke Sleep, 300
  invoke QueryPerformanceCounter, addr timeEnd
  fild timeEnd
  fild timeStart
  fsub
  fild timeFrequency
  fdiv
  fmul FP4(1000.0) ; to get milliseconds instead of seconds
  fistp timeElapsed
  inkey str$(dword ptr timeElapsed), " ms elapsed"
  exit
end start
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Lonewolff on April 13, 2018, 06:57:49 AM
Quote from: Siekmanski on April 12, 2018, 08:52:41 PM
-> Project doesn't link to ole32.lib.

CoInitialize and CoUninitialize do need ole32.lib, needed for COM.

Weird. Definitely not linking to this yet COM is working perfectly.

Maybe something in d3d11.lib?

I don't call CoInitialize or CoUnititialize anywhere either. Not even in my C++ code. Haven't had to do that since DX9. Maybe DX11 does this under the hood?


[edit]
Arggh! My bad. I am including masm32rt.inc which links to ole32.lib.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 13, 2018, 08:57:54 AM
I'm just following the rules.  :biggrin:

EDIT: I'm not certain now if it is really necessary for DirectX.  ::)
I never used CoCreateInstance as far as I can remember in my DirectX code.
Only used it in DirectShow I think.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms678543(v=vs.85).aspx
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on April 13, 2018, 09:29:19 PM
Guys,

Allow me to keep this topic going for a bit longer.

Since Hutch, Marinus, JJ and I are going through different routes, I wonder how the benchmarks behave (not done it yet). But an interesting one though.

I'm dealing with dwords and doing the stuff in cpu prior to FPU. Marinus is going SIMD with dd, and JJ is full dq.
My approach was:


.data
    SchedulerMS dd 1 ; granularity for Sleep
PerfCountFreq dd 0
LastCounter dd 0
EndCounter dd 0
ElapsedCounter dd 0
tFPS dd 0
MSPerFrame real8 0.0
SleepMS sdword 0
TargetSecPerFrame real8 16.0


.code
inv timeBeginPeriod, SchedulerMS
.IF eax != TIMERR_NOERROR
       console "ATTENTION: timeBeginPeriod failed!" ; (console is my printf macro)
.ENDIF
inv QueryPerformanceFrequency, ADDR PerfCountFreq
inv QueryPerformanceCounter, ADDR LastCounter

; prog loop starts

         ;;; code here

inv QueryPerformanceCounter, ADDR EndCounter
mov ecx, LastCounter
mov eax, EndCounter
sub eax, ecx
mov edx, 1000
mov ElapsedCounter, eax
mul edx

push eax
fild dword ptr [esp]
fidiv PerfCountFreq
fstp MSPerFrame

mov eax, PerfCountFreq
cdq
div dword ptr ElapsedCounter
mov tFPS, eax

fld TargetSecPerFrame
fsub MSPerFrame
fistp SleepMS

cmp SleepMS, 0
jle done
inv Sleep, dword ptr [SleepMS]

done:
[code]


By natural selection, I must be running far behind...but who knows... any comments?


edit to organize the code
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 13, 2018, 10:17:11 PM
QuoteI'm dealing with dwords and doing the stuff in cpu prior to FPU. Marinus is going SIMD with dd, and JJ is full dq.

Yeah, everybody has his own coding style.  :badgrin:

PerfCountFreq, LastCounter, EndCounter should be QWORD size.
Now they overwrite each other and ElapsedCounter also.

Maybe better to keep it all in the FPU then you can also make a reciprocal of PerfCountFreq and get rid of the fidiv instruction and replace it with fmul.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on April 13, 2018, 10:26:40 PM
QuoteMaybe better to keep it all in the FPU then you can also make a reciprocal of PerfCountFreq and get rid of the fidiv instruction and replace it with fmul.

This is the Jochen! nice
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 13, 2018, 10:35:59 PM
Quote from: LordAdef on April 13, 2018, 10:26:40 PM
QuoteMaybe better to keep it all in the FPU then you can also make a reciprocal of PerfCountFreq and get rid of the fidiv instruction and replace it with fmul.

This is the Jochen! nice

Or the Marinus if you like SIMD  :biggrin:
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 13, 2018, 10:52:32 PM
I presume your goal is to use the timers for your Games am I right?

If you like it I could post an example of my multimedia timers.
It handles TotalTime, TimeElapsed, FramesPerSecond, FrameTimeDelta and 15 additional resettable timers for game events.
But it is written in SIMD.  :biggrin:
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on April 13, 2018, 11:24:19 PM
Quote from: Siekmanski on April 13, 2018, 10:52:32 PM
I presume your goal is to use the timers for your Games am I right?

If you like it I could post an example of my multimedia timers.
It handles TotalTime, TimeElapsed, FramesPerSecond, FrameTimeDelta and 15 additional resettable timers for game events.
But it is written in SIMD.  :biggrin:

Yes! And it's in the main loop, so it must be optimized. I would love if you could do that! Thanks Marinus
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on April 14, 2018, 12:43:12 AM
Multimedia timers in action.  :biggrin:
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on April 14, 2018, 12:52:59 AM
Quote from: Siekmanski on April 14, 2018, 12:43:12 AM
Multimedia timers in action.  :biggrin:


Nice!
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on May 16, 2018, 04:17:23 PM
Quote from: Siekmanski on April 12, 2018, 06:33:30 PM
I adjusted the code in my previous post.

The Update_frame proc would be something like this:


FrameCounter                dd 0
TimeCounter                 real4 0.0
FrameTimeCounter            real4 0.0
FramesPerSecond             real4 0.0


    invoke      QueryPerformanceCounter,addr FrameTimeNew
    mov         eax,FrameTimeNew.Low32bit
    mov         ecx,eax
    sub         eax,FrameTimeOld.Low32bit
    mov         FrameTimeOld.Low32bit,ecx
    cvtsi2ss    xmm0,eax
    mulss       xmm0,TicksPerSecondReciprocal
    movss       FrameTimeDelta,xmm0 ; FPS = 1 / FrameTimeDelta

    movss       xmm1,TimeCounter
    addss       xmm1,xmm0
    movss       TimeCounter,xmm1

    inc         FrameCounter
    movss       xmm1,FrameTimeCounter
    addss       xmm1,xmm0
    comiss      xmm1,FLT4(1.0)
    jb          PerSecond
    cvtsi2ss    xmm0,FrameCounter
    divss       xmm0,xmm1
    movss       FramesPerSecond,xmm0  ; update per second
    mov         FrameCounter,0       
    xorps       xmm1,xmm1
PerSecond:   
    movss       FrameTimeCounter,xmm1


Marinus and friends,

Uasm is complaining of :

;comiss xmm1, FLT4(1.0)
Main.asm(281) : Error A2273: real or BCD number not allowed


It's the FLT4 macro. Any idea?
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on May 16, 2018, 05:13:41 PM
Try good ol' Masm32 FP4()
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on May 16, 2018, 05:40:09 PM
Or use it as a constant.

.const

fp1 real4 1.0

.code

    comiss      xmm1,fp1

Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: aw27 on May 16, 2018, 06:17:53 PM
FP4 is an "built-in" UASM macro
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on May 16, 2018, 06:53:11 PM
Quote from: aw27 on May 16, 2018, 06:17:53 PM
FP4 is an "built-in" UASM macro

No, it's from the Masm32 SDK (\masm32\macros\macros.asm):    ; **********************************************************
    ; function style macros for direct insertion of data types *
    ; **********************************************************

      FP4 MACRO value
        LOCAL vname
        .data
        align 4
          vname REAL4 value
        .code
        EXITM <vname>
      ENDM

      FP8 MACRO value
        LOCAL vname
        .data
        align 4
          vname REAL8 value
        .code
        EXITM <vname>
      ENDM

      FP10 MACRO value
        LOCAL vname
        .data
        align 4
          vname REAL10 value
        .code
        EXITM <vname>
      ENDM


Usage:include \masm32\include\masm32rt.inc

.code
start:
  int 3
  fld FP4(123.456)
  exit

end start
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: aw27 on May 16, 2018, 07:00:13 PM
Quote
No, it's from the Masm32 SDK (\masm32\macros\macros.asm):
When you have some time download UASM and read the uasm246_ext.pdf
Then try to make a project without using the "include \masm32\include\masm32rt.inc" (if you still remember how to do it of course).
To your surprise you will see that UASM can figure out what FP4 is.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: aw27 on May 16, 2018, 07:01:27 PM
Quote
No, it's from the Masm32 SDK (\masm32\macros\macros.asm):
When you have some time download UASM and read the uasm246_ext.pdf
Then try to make a project without using the "include \masm32\include\masm32rt.inc" (if you still remember how to do it of course).
To your surprise  :dazzled: you will see that UASM can figure out what FP4 is.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: HSE on May 16, 2018, 11:15:05 PM
Quote from: aw27 on May 16, 2018, 07:01:27 PM
read the uasm246_ext.pdf
You mean To read the manual?  :biggrin:
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: aw27 on May 17, 2018, 12:03:56 AM
Yeap, RTFM ("Read The Funtastic Manual")  ;)
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on May 17, 2018, 07:34:44 AM
thanks everyone.


So, how about FLT4, where is this macro from?
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on May 17, 2018, 08:00:09 AM
You can find it in dx9macros.inc ( part of my direct3d9 sources )

FLT4 MACRO float_number:REQ
LOCAL float_num
   .data
    align 4
    float_num real4 float_number
   .code
   EXITM <float_num>
ENDM

FLT8 MACRO float_number:REQ
LOCAL float_num
   .data
    align 8
    float_num real8 float_number
   .code
   EXITM <float_num>
ENDM

Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on May 17, 2018, 03:29:06 PM
Out of curiosity: was it just to have a customized macro name?
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on May 17, 2018, 08:53:39 PM
Not really. I don't use the masm32rt.inc or the masm32\macros\macros.asm in my sources.
FP4 is not a masm standard. Don't know who came up with this kind of macro first.
I'm using the FLT4 and FLT8 macros for +/- 20 years now and they are properly aligned.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on May 17, 2018, 10:34:15 PM
FP4 is aligned, too:
      FP4 MACRO value
        LOCAL vname
        .data
        align 4
          vname REAL4 value
        .code
        EXITM <vname>
      ENDM


I find this debate a bit academic. Without the Masm32 SDK, nobody would even know that MASM exists, or that writing Windows programs in Assembler is possible 8)

Certainly, this snippet works with UAsm. But it doesn't work with Masm. If, however, you enable the macros.asm line, it assembles with both. So what exactly is the added value of built-in FP? macros in UAsm?
.486                                      ; create 32 bit code
.model flat, stdcall                      ; 32 bit memory model
option casemap :none                      ; case sensitive

include \masm32\include\kernel32.inc
include \masm32\include\msvcrt.inc
; include \masm32\macros\macros.asm ; assembles with MASM and UAsm

includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\msvcrt.lib

.code
txFormat db "A double: %1.15f", 0
start:
  mov edi, offset FP8(0.0)
  fldpi
  fstp REAL8 PTR [edi]
  invoke crt_printf, addr txFormat, REAL8 PTR [edi]
  invoke ExitProcess, 0

end start
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on May 17, 2018, 11:09:15 PM
QuoteFP4 is aligned, too:

True, but FP8 isn't.

      FP8 MACRO value
        LOCAL vname
        .data
        align 4   <---- shouldn't this be align 8 ?
          vname REAL8 value
        .code
        EXITM <vname>
      ENDM

QuoteI find this debate a bit academic. Without the Masm32 SDK, nobody would even know that MASM exists, or that writing Windows programs in Assembler is possible 8)

You are totally right, but the question was: "So, how about FLT4, where is this macro from?" ( reply #59 )
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: aw27 on May 17, 2018, 11:55:16 PM
A problem is that we can't include the macros.inc without including the others.
It will not take much effort to start finding nuisances. For example in JJ's carefully chosen example it is enough  to change the calling convention to PASCAL to break the whole.

While in UASM we can simply declare our own prototypes and use the built-in FP4/FP8 macros.

Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on May 18, 2018, 12:31:04 AM
Quote from: Siekmanski on May 17, 2018, 11:09:15 PM
QuoteFP4 is aligned, too:

True, but FP8 isn't.
...
        align 4   <---- shouldn't this be align 8 ?

Alignment is overrated IMHO 8)Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

465     cycles for 100 * align8
484     cycles for 100 * align4
487     cycles for 100 * misaligned

500     cycles for 100 * align8
494     cycles for 100 * align4
464     cycles for 100 * misaligned
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: Siekmanski on May 18, 2018, 12:59:40 AM
It seems it is, but can we trust cycle counters on modern PC's?  :biggrin:

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

561     cycles for 100 * align8
566     cycles for 100 * align4
565     cycles for 100 * misaligned

567     cycles for 100 * align8
564     cycles for 100 * align4
559     cycles for 100 * misaligned

571     cycles for 100 * align8
567     cycles for 100 * align4
559     cycles for 100 * misaligned

562     cycles for 100 * align8
562     cycles for 100 * align4
559     cycles for 100 * misaligned

563     cycles for 100 * align8
562     cycles for 100 * align4
564     cycles for 100 * misaligned

12      bytes for align8
12      bytes for align4
12      bytes for misaligned


--- ok ---
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: daydreamer on May 18, 2018, 03:43:21 AM
you also have UASM and MASM macro for substitute several ugly messy
mov eax,immediate integers
movd (x)mm0,eax

with some MOVD (x)mm0,immediate integer macro?
also nice with 64bit and 128bit etc macros
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on May 18, 2018, 05:07:09 AM
Quote from: daydreamer on May 18, 2018, 03:43:21 AM
you also have UASM and MASM macro for substitute several ugly messy
mov eax,immediate integers
movd (x)mm0,eax

with some MOVD (x)mm0,immediate integer macro?
also nice with 64bit and 128bit etc macros

No problem:

include \masm32\MasmBasic\MasmBasic.inc         ; download (http://masm32.com/board/index.php?topic=94.0)

movx MACRO xmmArg, immArg
  if (opattr immArg) ne 36 ; atImmediate
       .err <** needs an immediate arg **>
  endif
  push immArg
  movd xmmArg, dword ptr [esp]
  add esp, 4
ENDM

  Init
  movx xmm2, 12345678h
  deb 1, "Result:", x:xmm2
EndOfCode


Doesn't trash eax, and works with ordinary non-MasmBasic code, too.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: HSE on May 18, 2018, 05:35:03 AM
Always making noise!!AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)

759     cycles for 100 * align8
1628    cycles for 100 * align4
755     cycles for 100 * misaligned

760     cycles for 100 * align8
1642    cycles for 100 * align4
755     cycles for 100 * misaligned

760     cycles for 100 * align8
1629    cycles for 100 * align4
764     cycles for 100 * misaligned

761     cycles for 100 * align8
1629    cycles for 100 * align4
755     cycles for 100 * misaligned

765     cycles for 100 * align8
1642    cycles for 100 * align4
755     cycles for 100 * misaligned

12      bytes for align8
12      bytes for align4
12      bytes for misaligned

--- ok ---


I don't know nothing about processor's architecture, but I think that AMD FPU is a RISC chip. In RISC processors alignment apparently is critical.

Look like Assembler have 8 aligned qwords by default. not that  :biggrin: Again, what happen here?
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: zedd151 on May 18, 2018, 06:02:22 AM
Win 10 Home, 64 bit    1.6  Ghz



AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

962     cycles for 100 * align8
1023    cycles for 100 * align4
939     cycles for 100 * misaligned

1003    cycles for 100 * align8
1020    cycles for 100 * align4
979     cycles for 100 * misaligned

991     cycles for 100 * align8
1047    cycles for 100 * align4
1012    cycles for 100 * misaligned

939     cycles for 100 * align8
1104    cycles for 100 * align4
948     cycles for 100 * misaligned

949     cycles for 100 * align8
1064    cycles for 100 * align4
948     cycles for 100 * misaligned

12      bytes for align8
12      bytes for align4
12      bytes for misaligned


Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: zedd151 on May 18, 2018, 06:26:37 AM
a little while later...


AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G   (SSE4)

877     cycles for 100 * align8
943     cycles for 100 * align4
877     cycles for 100 * misaligned

947     cycles for 100 * align8
943     cycles for 100 * align4
878     cycles for 100 * misaligned

877     cycles for 100 * align8
1042    cycles for 100 * align4
878     cycles for 100 * misaligned

876     cycles for 100 * align8
1035    cycles for 100 * align4
875     cycles for 100 * misaligned

876     cycles for 100 * align8
1064    cycles for 100 * align4
883     cycles for 100 * misaligned

12      bytes for align8
12      bytes for align4
12      bytes for misaligned




this computer doesn't seem to like align 4   
HSE's really doesn't like it.   :P
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: LordAdef on May 18, 2018, 07:49:34 AM
These benchmarks are sometime rather interesting, since many times we don't get any common ground conclusion.

But aligning is so simple that I don't mind doing it anyway
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on May 18, 2018, 10:14:54 AM
Quote from: LordAdef on May 18, 2018, 07:49:34 AMBut aligning is so simple that I don't mind doing it anyway

If the code gets any faster with alignment, it makes sense in an innermost loop with a Million iterations. Otherwise it bloats your exe, pollutes the data cache, and thus may slow down the whole program.
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: daydreamer on May 18, 2018, 08:55:51 PM
Quote from: jj2007 on May 18, 2018, 10:14:54 AM
Quote from: LordAdef on May 18, 2018, 07:49:34 AMBut aligning is so simple that I don't mind doing it anyway

If the code gets any faster with alignment, it makes sense in an innermost loop with a Million iterations. Otherwise it bloats your exe, pollutes the data cache, and thus may slow down the whole program.
Thanks for macro jj
Thanks for a timing test idea:
Align 16 data with sse code,so you easily can use mulps,divps etc with variables in memory
Vs you are forced to not be able to use memory aligned data with simd,so instead you use lots of movups before innerloop and innerloop makes use of all 16 xmm regs in 64bit mode for all mulps etc is reg to reg,all variables are kept in .xmm regs
And testrun this several million times
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: jj2007 on May 18, 2018, 09:48:28 PM
Note that your testcase is different: fld alignedVariable vs fld unalignedVariable uses identical instructions. When using SIMD instructions that throw exceptions, you need additional instructions, and that may cost cycles, of course. Test it... GetTickCount is your friend ;)
Title: Re: Help with QueryPerformanceCounter and 64 bit numbers
Post by: daydreamer on May 19, 2018, 07:16:28 AM
Quote from: jj2007 on May 18, 2018, 09:48:28 PM
Note that your testcase is different: fld alignedVariable vs fld unalignedVariable uses identical instructions. When using SIMD instructions that throw exceptions, you need additional instructions, and that may cost cycles, of course. Test it... GetTickCount is your friend ;)
my C++ exercise, force me into use movups and mulps xmmreg,xmmreg,inline asm,so I could as well try that different solution for innerloop