Easy one or may be not

aw27 · January 19, 2018, 05:34:50 AM

How to swap the contents of rax and rdx using only logical instructions?

Vortex · January 19, 2018, 06:45:45 AM

xor rax,rdx
xor rdx,rax
xor rax,rdx

https://en.wikipedia.org/wiki/XOR_swap_algorithm

https://betterexplained.com/articles/swap-two-variables-using-xor/

aw27 · January 19, 2018, 01:44:32 PM

:t (

)

jj2007 · January 19, 2018, 02:00:32 PM

Quote from: Vortex on January 19, 2018, 06:45:45 AM
Code Select Expand
xor rax,rdx xor rdx,rax xor rax,rdx

It works, but where is the advantage? Xchg is shorter and twice as fast, see attachment.

Code Select

include \Masm32\MasmBasic\Res\JBasic.inc
Init		; OPT_64 1	; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
  mov rax, 123
  mov rdx, 456
  usedeb=1
  deb 4, "start", rax, rdx
  xor rax, rdx
  xor rdx, rax
  xor rax, rdx
  deb 4, "xored", rax, rdx
  xchg rax, rdx
  deb 4, "xchged", rax, rdx
EndOfCode

Output:

Code Select

start
rax     123
rdx     456
xored
rax     456
rdx     123
xchged
rax     123
rdx     456

aw27 · January 19, 2018, 02:11:34 PM

Quote from: jj2007 on January 19, 2018, 02:00:32 PM
It works, but where is the advantage?

The advantage is to become aware that is good to have a CISC processor and not a RISC one.

jj2007 · January 19, 2018, 02:20:52 PM

So processors that can xor regs are RISC, those who can xchg regs are CISC?

aw27 · January 19, 2018, 02:39:19 PM

Quote from: jj2007 on January 19, 2018, 02:20:52 PM
So processors that can xor regs are RISC, those who can xchg regs are CISC?

No, things are more complicated. Actually current Intel processors are CISC outside and RISC inside, they use microcode for complex instructions. So when you look at an instruction like xchg you may not realize that 3 xors took place.

jj2007 · January 19, 2018, 07:16:59 PM

Quote from: aw27 on January 19, 2018, 02:39:19 PMSo when you look at an instruction like xchg you may not realize that 3 xors took place.

Perhaps, but the CPU has also lots of shadow registers, so it might as well use a temporary reg, see third option below. In any case, the "inside" micro-coding of xchg rax, rdx is more than twice as fast as the "outside" RISC coding via three xor instructions:

Code Select

3*xor
rax     123
rdx     456
ticks 3*xor     rax     6630
xchg
rax     123
rdx     456
ticks 1*xchg    rax     2543
3*mov
rax     123
rdx     456
ticks 3*mov     rax     2528
This code was assembled with ml64 in 64-bit format

Timings on a Core i5. I have added a third option:

Code Select

@@:
  mov r8, rdx
  mov rdx, rax
  mov rax, r8
  dec rcx
  jne @B

Options 2+3 are a factor 2.6 faster than the 3*xor sequence, corrected for loop overhead (see attached project, *.asc opens in WordPad).

aw27 · January 19, 2018, 08:37:56 PM

I believe that xchg might be faster, simply because it would be a shame for Intel/AMD if it could be replaced with advantage by a trio of xors.
However, I don't believe much in speed tests, this has been discussed many times before.

jj2007 · January 19, 2018, 09:28:15 PM

Quote from: aw27 on January 19, 2018, 08:37:56 PMI don't believe much in speed tests, this has been discussed many times before.

Sure, they are devil's work! Better to have a theory, empirics is for the Warmduscher fraction

Core i5:

Code Select

ticks 3*xor     rax     3322
ticks 1*xchg    rax     1294
ticks 3*mov     rax     1247

ticks 3*xor     rax     3339
ticks 1*xchg    rax     1295
ticks 3*mov     rax     1248

ticks 3*xor     rax     3308
ticks 1*xchg    rax     1295
ticks 3*mov     rax     1279

ticks 3*xor     rax     3323
ticks 1*xchg    rax     1280
ticks 3*mov     rax     1279

Intel(R) Celeron(R) CPU N2840 @ 2.16GHz:

Code Select

ticks 3*xor     rax     7047
ticks 1*xchg    rax     2797
ticks 3*mov     rax     2843

ticks 3*xor     rax     6001
ticks 1*xchg    rax     3109
ticks 3*mov     rax     3156

ticks 3*xor     rax     6171
ticks 1*xchg    rax     3390
ticks 3*mov     rax     3124

ticks 3*xor     rax     7000
ticks 1*xchg    rax     3578
ticks 3*mov     rax     3094

aw27 · January 19, 2018, 09:47:40 PM

Quote
empirics is for the Warmduscher fraction

OK, I will make a note. :t

The MASM Forum

News:

Easy one or may be not

aw27

Vortex

aw27

jj2007

aw27

jj2007

aw27

jj2007

aw27

jj2007

aw27