I have been trying to understand more about the macros in masm32\macros\timers.asm. In particular, I have been looking at the macros that calculate milliseconds. And even more particular, the timer_end macro. At the end of the timer_end macro, there are nine lines of code, two mov instructions and seven floating point instructions. flid instructions push __timer__pc__count__ and __timer__pc__frequency__ on the stack. then fdiv is performed. I believe this operation is dividing the pc frequency by the total cycle count. Unless I am crazy, I think it should divide total cycle count by frequency.
Please tell me I am not crazy.
Thanks, Jim
you are crazy ! :badgrin:
it is dividing the total number of clock cycles by the number of loop passes
on each pass through the test code, it totalizes clock cycles
when the loop count has expired, the number of clock cylces for a single pass is calculated
Thanks for your reply! Dividing the total number of clock cycles by the number of loop passes is what it SHOULD be doing. BUT the devil is in the details. __timer__pc__frequency__ is obtained from the QueryPerformanceFrequency function which returns PC clock cycles per second. And __timer__pc__count__ is as you say, the total clock cycles for execution of the timed code. And I believe if you look at the push instructions followed by the fdiv instruction, you will see that pc clock cycles is being divided by total cycles. Please show me why I am crazy.
Regards, Jim
Quote from: hawkeye62 on June 05, 2013, 10:43:51 AMQueryPerformanceFrequency function which returns PC clock cycles per second
no. -EDIT- I've interpreted "PC clock cycles" as CPU frequency :icon_eek:
----Windows use programmable hardware timers for the performance counter - the frequency of these is independent form the CPU's one.
You can read out the CPU frequency from the registry:
LOCAL hKey:HKEY,freq:REAL8,qMHz:QWORD,_size:DWORD
...
mov _size,4
.if rvx(esi = RegOpenKeyEx,HKEY_LOCAL_MACHINE,"HARDWARE\DESCRIPTION\System\CentralProcessor\0",0,KEY_READ,&hKey) != ERROR_SUCCESS || \
rvx(RegQueryValueEx,hKey,"~MHz",0,0,&qMHz, &_size) != ERROR_SUCCESS
; error: can't read CPU frequency from registry
.endif
invoke RegCloseKey,hKey
mov DWORD ptr qMHz[4],0
fild qMHz
fmul FP8(1.0E6)
fstp freq ; [1/s]
lol
you're probably right
let me have a look at it - it's been a while
yah
that one uses the high-resolution counter (QueryPerformanceCounter)
we rarely use that one
we generally use the first set of macros, counter_begin and counter_end
RDTSC is about 10 times faster than QueryPerformanceCounter
counter_end divides the cycle count by the pass count :P
In the timer_end macro:
finit
fild __timer__pc__count__
fild __timer__pc__frequency__
fdiv
mov __timer__dw_count__, 1000
fild __timer__dw_count__
fmul
fistp __timer__dw_count__
Because the FDIV instruction has no operands it is encoded as FDIVP ST(1), ST(0), so it divides ST(1) (__timer__pc__count__) by ST(0) (__timer__pc__frequency__) to calculate the elapsed seconds for the entire loop, and then the FMUL instruction multiples the result by 1000 to convert it to elapsed milliseconds.
All of the references I have seen say that fdiv with no operands divides st(0) by st(1).
Regards, Jim
;==============================================================================
include \masm32\include\masm32rt.inc
;==============================================================================
.data
ten REAL8 10.0
five REAL8 5.0
r8 REAL8 ?
.code
;==============================================================================
start:
;==============================================================================
fld ten
fld five
fdiv
fstp r8
printf("%f\n", r8)
fld ten
fld five
fdivr
fstp r8
printf("%f\n", r8)
inkey
exit
;==============================================================================
END start
00401000 start:
00401000 DD0500304000 fld qword ptr [off_00403000]
00401006 DD0508304000 fld qword ptr [off_00403008]
0040100C DEF9 fdivp st(1),st
0040100E DD1D10304000 fstp qword ptr [off_00403010]
00401014 FF3514304000 push dword ptr [off_00403014]
0040101A FF3510304000 push dword ptr [off_00403010]
00401020 6818304000 push offset off_00403018 ; '%f',00Dh,00Ah,000h
00401025 FF1520204000 call dword ptr [printf]
0040102B 83C40C add esp,0Ch
0040102E DD0500304000 fld qword ptr [off_00403000]
00401034 DD0508304000 fld qword ptr [off_00403008]
0040103A DEF1 fdivrp st(1),st
2.000000
0.500000
And from a recent Intel manual for FDIV/FDIVP/FIDIV-Divide:
Quote
The no-operand version of this instruction divides the contents of the ST(1) register by the contents of the ST(0) register.
...
The FDIVP instructions perform the additional operation of popping the FPU register stack after storing the result.
...
In some assemblers, the mnemonic for this instruction is FDIV rather than FDIVP.
Here is a direct quote from "Art of Assembly" Chapter 14 at cs.smith.edu.
"With zero operands, the fdiv and fdivp instructions pop st(0) and st(1), compute st(0)/st(1), and push the result back onto the stack. The fdivr and fdivrp instructions also pop st(0) and st(1) but compute st(1)/st(0) before pushing the quotient onto the stack."
So much for "Art of Assembly".
Thanks for the help, crazy jim :icon_redface:
you can download the intel manuals for info
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html (http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
also, Raymond has a nice FPU tutorial that explains each instruction...
http://www.ray.masmcode.com/ (http://www.ray.masmcode.com/)
I would suggest to use AMD's documentation, because the manuals are separated by instruction sets: http://support.amd.com/us/Processor_TechDocs/26569_APM_v5.pdf
OK guys. Thanks very much for the patience and the help.
Regards, crazy Jim