Author Topic: Easy one or may be not  (Read 1040 times)

AW

  • Member
  • *****
  • Posts: 1496
  • Let's Make ASM Great Again!
Easy one or may be not
« on: January 19, 2018, 05:34:50 AM »
How to swap the contents of rax and rdx using only logical instructions?

Vortex

  • Member
  • *****
  • Posts: 1850

AW

  • Member
  • *****
  • Posts: 1496
  • Let's Make ASM Great Again!
Re: Easy one or may be not
« Reply #2 on: January 19, 2018, 01:44:32 PM »
 :t ( :biggrin:)

jj2007

  • Member
  • *****
  • Posts: 8773
  • Assembler is fun ;-)
    • MasmBasic
Re: Easy one or may be not
« Reply #3 on: January 19, 2018, 02:00:32 PM »
Code: [Select]
xor rax,rdx
xor rdx,rax
xor rax,rdx

It works, but where is the advantage? Xchg is shorter and twice as fast, see attachment.
Code: [Select]
include \Masm32\MasmBasic\Res\JBasic.inc
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
  mov rax, 123
  mov rdx, 456
  usedeb=1
  deb 4, "start", rax, rdx
  xor rax, rdx
  xor rdx, rax
  xor rax, rdx
  deb 4, "xored", rax, rdx
  xchg rax, rdx
  deb 4, "xchged", rax, rdx
EndOfCode

Output:
Code: [Select]
start
rax     123
rdx     456
xored
rax     456
rdx     123
xchged
rax     123
rdx     456

AW

  • Member
  • *****
  • Posts: 1496
  • Let's Make ASM Great Again!
Re: Easy one or may be not
« Reply #4 on: January 19, 2018, 02:11:34 PM »
It works, but where is the advantage?
The advantage is to become aware that is good to have a CISC processor and not a RISC one.

jj2007

  • Member
  • *****
  • Posts: 8773
  • Assembler is fun ;-)
    • MasmBasic
Re: Easy one or may be not
« Reply #5 on: January 19, 2018, 02:20:52 PM »
So processors that can xor regs are RISC, those who can xchg regs are CISC?

AW

  • Member
  • *****
  • Posts: 1496
  • Let's Make ASM Great Again!
Re: Easy one or may be not
« Reply #6 on: January 19, 2018, 02:39:19 PM »
So processors that can xor regs are RISC, those who can xchg regs are CISC?
No, things are more complicated. Actually current Intel processors are CISC outside and RISC inside, they use microcode for complex instructions. So when you look at an instruction like xchg you may not realize that 3 xors took place.

jj2007

  • Member
  • *****
  • Posts: 8773
  • Assembler is fun ;-)
    • MasmBasic
Re: Easy one or may be not
« Reply #7 on: January 19, 2018, 07:16:59 PM »
So when you look at an instruction like xchg you may not realize that 3 xors took place.

Perhaps, but the CPU has also lots of shadow registers, so it might as well use a temporary reg, see third option below. In any case, the "inside" micro-coding of xchg rax, rdx is more than twice as fast as the "outside" RISC coding via three xor instructions:
Code: [Select]
3*xor
rax     123
rdx     456
ticks 3*xor     rax     6630
xchg
rax     123
rdx     456
ticks 1*xchg    rax     2543
3*mov
rax     123
rdx     456
ticks 3*mov     rax     2528
This code was assembled with ml64 in 64-bit format

Timings on a Core i5. I have added a third option:
Code: [Select]
@@:
  mov r8, rdx
  mov rdx, rax
  mov rax, r8
  dec rcx
  jne @B

Options 2+3 are a factor 2.6 faster than the 3*xor sequence, corrected for loop overhead (see attached project, *.asc opens in WordPad).

AW

  • Member
  • *****
  • Posts: 1496
  • Let's Make ASM Great Again!
Re: Easy one or may be not
« Reply #8 on: January 19, 2018, 08:37:56 PM »
I believe that xchg might be faster, simply because it would be a shame for Intel/AMD if it could be replaced with advantage by a trio of xors.
However, I don't believe much in speed tests, this has been discussed many times before.

jj2007

  • Member
  • *****
  • Posts: 8773
  • Assembler is fun ;-)
    • MasmBasic
Re: Easy one or may be not
« Reply #9 on: January 19, 2018, 09:28:15 PM »
I don't believe much in speed tests, this has been discussed many times before.

Sure, they are devil's work! Better to have a theory, empirics is for the Warmduscher fraction :greensml:
Core i5:
Code: [Select]
ticks 3*xor     rax     3322
ticks 1*xchg    rax     1294
ticks 3*mov     rax     1247

ticks 3*xor     rax     3339
ticks 1*xchg    rax     1295
ticks 3*mov     rax     1248

ticks 3*xor     rax     3308
ticks 1*xchg    rax     1295
ticks 3*mov     rax     1279

ticks 3*xor     rax     3323
ticks 1*xchg    rax     1280
ticks 3*mov     rax     1279

Intel(R) Celeron(R) CPU  N2840  @ 2.16GHz:
Code: [Select]
ticks 3*xor     rax     7047
ticks 1*xchg    rax     2797
ticks 3*mov     rax     2843

ticks 3*xor     rax     6001
ticks 1*xchg    rax     3109
ticks 3*mov     rax     3156

ticks 3*xor     rax     6171
ticks 1*xchg    rax     3390
ticks 3*mov     rax     3124

ticks 3*xor     rax     7000
ticks 1*xchg    rax     3578
ticks 3*mov     rax     3094
« Last Edit: January 19, 2018, 11:15:00 PM by jj2007 »

AW

  • Member
  • *****
  • Posts: 1496
  • Let's Make ASM Great Again!
Re: Easy one or may be not
« Reply #10 on: January 19, 2018, 09:47:40 PM »
Quote
empirics is for the Warmduscher fraction :greensml:
OK, I will make a note. :t