News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Optimizing Collatz

Started by fylux, November 16, 2017, 10:32:08 PM

Previous topic - Next topic

aw27

Code timing estimation should only be done with a... timer. Cache, branch prediction, pipelining, out-of-order execution, may weight much more than instructions per cycle or latency.

jj2007

Quote from: fylux on November 17, 2017, 03:36:25 AMbt should be faster than and for example.

Time it...670 ms for empty loop
1358 ms for test al, 1
1497 ms for bt rax, 0
1373 ms for and al, 1

671 ms for empty loop
1341 ms for test al, 1
1482 ms for bt rax, 0
1358 ms for and al, 1

655 ms for empty loop
1357 ms for test al, 1
1482 ms for bt rax, 0
1373 ms for and al, 1


See 64-bit assembly with RichMasm if you want to build the source. Note that there is loop overhead included, so bt is really pretty slow on this Intel Core i5...

nidud

#32
deleted

jj2007

Interesting. How do you explain that and eax, 1 is a factor 4 slower than test eax, 1? In my tests above they are equally fast.

nidud

#34
deleted

johnsa

I believe the difference between AND and TEST is exactly that, there is a lot of avoided internal logic with TEST as the register contents is not modified and only the flags, this also has an impact on out of order execution in that the CPU is aware that the instruction won't affect the contents of the register.