News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Timing your code: what's faster, mov eax, 123 or mov rax, 123?

Started by jj2007, August 24, 2023, 11:26:55 PM

Previous topic - Next topic

jj2007

Here is a fairly straightforward example on how to time your code. Source and exe attached - you need the Masm64 SDK to build it.

It compares two instructions, using two different but equivalent settings:
a) 1024000 tests of 1024 mov... instructions
b) 1024000/2 tests of 1024*2 of the same mov... instructions

1000 mega iterations, 1024 instructions
284 megacycles for mov eax,123
276 megacycles for mov rax,123

284 megacycles for mov eax,123
292 megacycles for mov rax,123

278 megacycles for mov eax,123
277 megacycles for mov rax,123

1000 mega iterations, 2048 instructions
277 megacycles for mov eax,123
324 megacycles for mov rax,123

279 megacycles for mov eax,123
330 megacycles for mov rax,123

274 megacycles for mov eax,123
327 megacycles for mov rax,123

Here is the source. On top is a macro that...
- with no args, i.e. tCycles, loads the initial cycle count into rbx
- with a string arg, i.e. tCycles mov eax, 123, subtracts rbx from the final count and displays the difference

include \masm64\include64\masm64rt.inc    ; *** standard Masm64 SDK code ***

tCycles MACRO arg:VARARG
  rdtsc
  shl rdx, 32
  or rax, rdx
  ifb <arg>
    mov rbx, rax
  else
    sub rax, rbx
    sar rax, 20
    invoke __imp__cprintf, cfm$("%i megacycles for &arg&\n"), rax
  endif
ENDM

.code
entry_point proc
  instructions=1024
  tests=1024000
  invoke __imp__cprintf, cfm$("%i mega iterations, %i instructions\n"), instructions*tests/1048576, instructions
  REPEAT 3
    tCycles
    xor ecx, ecx
    align 4
    @@: REPEAT instructions
            mov eax, 123456789    ; 5 bytes
        ENDM
        inc ecx
        cmp ecx, tests
        jnz @B
    tCycles mov eax, 123    ; end of test, print "xx cycles for mov eax, 123"
    tCycles
    xor ecx, ecx
    align 4
    @@: REPEAT instructions
            mov rax, 123456789    ; 7 bytes
        ENDM
        inc ecx
        cmp ecx, tests
        jnz @B
    tCycles mov rax, 123
    invoke __imp__cprintf, cfm$("\n")
  ENDM
  instructions=instructions*2
  tests=tests/2
  invoke __imp__cprintf, cfm$("%i mega iterations, %i instructions\n"), instructions*tests/1048576, instructions
  REPEAT 3
    tCycles
    xor ecx, ecx
    align 4
    @@: REPEAT instructions
            mov eax, 123456789    ; 5 bytes
        ENDM
        inc ecx
        cmp ecx, tests
        jnz @B
    tCycles mov eax, 123    ; end of test, print "xx cycles for mov eax, 123"
    tCycles
    xor ecx, ecx
    align 4
    @@: REPEAT instructions
            mov rax, 123456789    ; 7 bytes
        ENDM
        inc ecx
        cmp ecx, tests
        jnz @B
    tCycles mov rax, 123
    invoke __imp__cprintf, cfm$("\n")
  ENDM
  invoke __imp_MessageBoxA, 0, chr$("Now guess why the second run is slower for mov rax"), chr$("Mysteries of the cpu:"), MB_OK
  invoke ExitProcess, 0                        ; terminate process
entry_point endp
end

Now the question is obviously, "why is the second run slower for mov rax, 123?"

The answer is simple, but you won't find it on the Internet ;-)

NoCforMe

Interesting. However, I wonder if you'd be so kind as to maybe post some code that doesn't include fancy macros and other baggage, for those among us, like myself, who are macro-averse.

I see that the most important part of this is simply the RDTSC instruction, which reads the current time-stamp counter. So this could be vastly simplified to make it more understandable, I think.
Assembly language programming should be fun. That's why I do it.

jj2007

Folks,

I am really surprised that my post has been moved. I wrote a very, very simple program showing how to time code. It's a 100 times simpler that some other recent stuff I've seen in the Campus. Hutch would never have moved such simple stuff away from the Campus.

And yes, it does contain a macro. N00bs working with a macro assembler should see what they are good for - and here they have a very clear and simple function. If you feel challenged by 10 lines of a simple macro: NASM and FASM are good bare metal assemblers. No need for MASM.

Quote from: NoCforMe on August 25, 2023, 05:56:31 AMI see that the most important part of this is simply the RDTSC instruction

No.

zedd151

QuoteFolks,

I am really surprised that my post has been moved.

Everyone that has been following this saga can clearly see that this topic is part of your feud with mineiro from here:  https://masm32.com/board/index.php?topic=11176.0 Which was split off from a topic:  https://masm32.com/board/index.php?topic=11165.0  within The Campus.

As is mineiro's topic, testp also part of that feud and I moved that one also from the Campus, to the Laboratory.

Word your comments about the move any way you like, but it does not change what has already transpired in the Campus between yourself and mineiro. Moving these topics was done to curtail the continuation of that feud within the Campus.

jj2007

I have made an effort to provide a simple example, with source, and using Hutch' Masm64 SDK, how to time code. It was meant for the Campus. I am sorry that it didn't find the approval of Hutch 2.0 :cool:

Quote from: jj2007 on August 24, 2023, 11:26:55 PMNow the question is obviously, "why is the second run slower for mov rax, 123?"

It is interesting that nobody seems interested to find out or discuss why mov rax, under certain conditions, is slower than mov eax.

stoo23

Well, whilst obviously Not being a Programmer, (and with the Greatest of respect), may I suggest that even as a mere 'observer', perhaps the Campus is NOT the Correct place to have 'Placed' your Code example, no matter How useful and/or correct it may be.

One could argue the same 'Usefulness', about Many Code fragments and examples placed throughout this Forum.
From the 'Descriptor' for the Campus:
QuoteThe Campus
A protected forum where programmers learning assembler can ask questions in a sensible and safe atmosphere without being harassed or insulted. This is also targeted at experienced programmers from other languages learning assembler that don't want to be treated like kids.
I would suggest the Campus, is NOT the Correct Board to (in effect), 'Present' Code or Programs. Perhaps, in reply to a Question or request etc but NOT 'In the First Instance'.
I have had a cursory look here and on the Old UK site for examples of where this may have occurred (and am Not suggesting there aren't Any), but have not actually found any.

As the Code you have 'Presented' and specifically, from it's Title; "Timing your code: what's faster" it would seem to Me to be well placed / moved to the Laboratory.
QuoteThe Laboratory
This is the place to post assembler algorithms and code design for discussion, optimization and any other improvements that can be made on it. Post code here to be beaten to death to make it better, smaller, faster or more powerful.

Even as a Non Skilled Newcomer here, I am somewhat intrigued by much of the recent activity (not only by yourself), that has occurred in the Campus and can only suggest (from my own understanding), let alone what others have suggested, would NOT have been tolerated by Hutch.

Please understand I don't want to offend or upset anyone here, but am simply offering MY observations  :smiley:

jj2007

Quote from: stoo23 on August 25, 2023, 04:01:53 PMI would suggest the Campus, is NOT the Correct Board to (in effect), 'Present' Code or Programs. Perhaps, in reply to a Question or request etc but NOT 'In the First Instance'.

Point taken. Hutch most probably would not have moved a simple "teaching" example, but your logic is correct, Stewart.

stoo23