How to swap the contents of rax and rdx using only logical instructions?
xor rax,rdx
xor rdx,rax
xor rax,rdx
https://en.wikipedia.org/wiki/XOR_swap_algorithm
https://betterexplained.com/articles/swap-two-variables-using-xor/
:t ( :biggrin:)
Quote from: Vortex on January 19, 2018, 06:45:45 AM
xor rax,rdx
xor rdx,rax
xor rax,rdx
It works, but where is the advantage? Xchg is shorter and twice as fast, see attachment.
include \Masm32\MasmBasic\Res\JBasic.inc
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
PrintLine Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
mov rax, 123
mov rdx, 456
usedeb=1
deb 4, "start", rax, rdx
xor rax, rdx
xor rdx, rax
xor rax, rdx
deb 4, "xored", rax, rdx
xchg rax, rdx
deb 4, "xchged", rax, rdx
EndOfCode
Output:
start
rax 123
rdx 456
xored
rax 456
rdx 123
xchged
rax 123
rdx 456
Quote from: jj2007 on January 19, 2018, 02:00:32 PM
It works, but where is the advantage?
The advantage is to become aware that is good to have a CISC processor and not a RISC one.
So processors that can xor regs are RISC, those who can xchg regs are CISC?
Quote from: jj2007 on January 19, 2018, 02:20:52 PM
So processors that can xor regs are RISC, those who can xchg regs are CISC?
No, things are more complicated. Actually current Intel processors are CISC outside and RISC inside, they use microcode for complex instructions. So when you look at an instruction like xchg you may not realize that 3 xors took place.
Quote from: aw27 on January 19, 2018, 02:39:19 PMSo when you look at an instruction like xchg you may not realize that 3 xors took place.
Perhaps, but the CPU has also lots of shadow registers, so it might as well use a temporary reg, see third option below. In any case, the "inside" micro-coding of xchg rax, rdx is more than twice as fast as the "outside" RISC coding via three xor instructions:
3*xor
rax 123
rdx 456
ticks 3*xor rax 6630
xchg
rax 123
rdx 456
ticks 1*xchg rax 2543
3*mov
rax 123
rdx 456
ticks 3*mov rax 2528
This code was assembled with ml64 in 64-bit format
Timings on a Core i5. I have added a third option:
@@:
mov r8, rdx
mov rdx, rax
mov rax, r8
dec rcx
jne @B
Options 2+3 are a factor 2.6 faster than the 3*xor sequence, corrected for loop overhead (see attached project, *.asc opens in WordPad).
I believe that xchg might be faster, simply because it would be a shame for Intel/AMD if it could be replaced with advantage by a trio of xors.
However, I don't believe much in speed tests, this has been discussed many times before.
Quote from: aw27 on January 19, 2018, 08:37:56 PMI don't believe much in speed tests, this has been discussed many times before.
Sure, they are devil's work! Better to have a theory, empirics is for the Warmduscher fraction :greensml:
Core i5:
ticks 3*xor rax 3322
ticks 1*xchg rax 1294
ticks 3*mov rax 1247
ticks 3*xor rax 3339
ticks 1*xchg rax 1295
ticks 3*mov rax 1248
ticks 3*xor rax 3308
ticks 1*xchg rax 1295
ticks 3*mov rax 1279
ticks 3*xor rax 3323
ticks 1*xchg rax 1280
ticks 3*mov rax 1279
Intel(R) Celeron(R) CPU N2840 @ 2.16GHz:
ticks 3*xor rax 7047
ticks 1*xchg rax 2797
ticks 3*mov rax 2843
ticks 3*xor rax 6001
ticks 1*xchg rax 3109
ticks 3*mov rax 3156
ticks 3*xor rax 6171
ticks 1*xchg rax 3390
ticks 3*mov rax 3124
ticks 3*xor rax 7000
ticks 1*xchg rax 3578
ticks 3*mov rax 3094
Quote
empirics is for the Warmduscher fraction :greensml:
OK, I will make a note. :t