Author Topic: Optimizing Collatz  (Read 430 times)

aw27

  • Member
  • ****
  • Posts: 851
  • Let's Make ASM Great Again!
Re: Optimizing Collatz
« Reply #30 on: November 17, 2017, 03:56:56 AM »
Code timing estimation should only be done with a... timer. Cache, branch prediction, pipelining, out-of-order execution, may weight much more than instructions per cycle or latency.

jj2007

  • Member
  • *****
  • Posts: 7728
  • Assembler is fun ;-)
    • MasmBasic
Re: Optimizing Collatz
« Reply #31 on: November 17, 2017, 04:42:38 AM »
bt should be faster than and for example.

Time it...
Code: [Select]
670 ms for empty loop
1358 ms for test al, 1
1497 ms for bt rax, 0
1373 ms for and al, 1

671 ms for empty loop
1341 ms for test al, 1
1482 ms for bt rax, 0
1358 ms for and al, 1

655 ms for empty loop
1357 ms for test al, 1
1482 ms for bt rax, 0
1373 ms for and al, 1

See 64-bit assembly with RichMasm if you want to build the source. Note that there is loop overhead included, so bt is really pretty slow on this Intel Core i5...

nidud

  • Member
  • *****
  • Posts: 1408
    • https://github.com/nidud/asmc
Re: Optimizing Collatz
« Reply #32 on: November 17, 2017, 05:03:36 AM »
Code: [Select]
Intel(R) Core(TM) i5-6500T CPU @ 2.50GHz (AVX2)
----------------------------------------------
-- (1)
    85352 cycles, rep(3000), code(257) 1.asm: bt   eax,0
    44312 cycles, rep(3000), code(321) 2.asm: test eax,1
   164616 cycles, rep(3000), code(193) 3.asm: and  eax,1
-- (2)
    84463 cycles, rep(3000), code(257) 1.asm: bt   eax,0
    43973 cycles, rep(3000), code(321) 2.asm: test eax,1
   164331 cycles, rep(3000), code(193) 3.asm: and  eax,1
-- (3)
    85686 cycles, rep(3000), code(257) 1.asm: bt   eax,0
    45461 cycles, rep(3000), code(321) 2.asm: test eax,1
   166011 cycles, rep(3000), code(193) 3.asm: and  eax,1

total [1 .. 3], 1++
   133746 cycles 2.asm: test eax,1
   255501 cycles 1.asm: bt   eax,0
   494958 cycles 3.asm: and  eax,1
hit any key to continue...

jj2007

  • Member
  • *****
  • Posts: 7728
  • Assembler is fun ;-)
    • MasmBasic
Re: Optimizing Collatz
« Reply #33 on: November 17, 2017, 05:12:29 AM »
Interesting. How do you explain that and eax, 1 is a factor 4 slower than test eax, 1? In my tests above they are equally fast.

nidud

  • Member
  • *****
  • Posts: 1408
    • https://github.com/nidud/asmc
Re: Optimizing Collatz
« Reply #34 on: November 17, 2017, 05:26:58 AM »
Difficult to say. And do however change the content of the register so maybe that's the reason. Seems consistent with the i3 test:
Code: [Select]
1. AMD Athlon(tm) II X2 245 Processor
2. Intel(R) Core(TM) i3-2367M CPU @ 1.40GHz

Instr. Operands Size CPU1 CPU2
------------------------------------------------------
TEST reg,imm [6] 1.0 1.1
BT reg,imm [4] 1.0 1.6
AND acc,imm [3] 2.8 2.5

johnsa

  • Member
  • ****
  • Posts: 580
    • Uasm
Re: Optimizing Collatz
« Reply #35 on: November 21, 2017, 02:02:04 AM »
I believe the difference between AND and TEST is exactly that, there is a lot of avoided internal logic with TEST as the register contents is not modified and only the flags, this also has an impact on out of order execution in that the CPU is aware that the instruction won't affect the contents of the register.