Author Topic: Instruction Timing  (Read 1851 times)

InfiniteLoop

  • Regular Member
  • *
  • Posts: 18
Re: Instruction Timing
« Reply #45 on: January 08, 2022, 08:06:24 PM »
This thread seems the most relevant for these thoughts:
1. LEA on Skylake has 3 versions [rcx+offset] [rcx+rax] [rcx*2+rax+offset] taking x,2x,3x cycles respectively. This was surprising since I thought there were only "simple" and "complex" address types, apparently there's a "medium" one too.
On AlderLake LEA has the same timing for all.

2. What is faster?
vxorps xmm31, xmm0,xmm0 or vxorps xmm31,xmm31,xmm31 ?
The former might not be as clever as it seems since the zero elimination might not work. I haven't tested it.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8879
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Instruction Timing
« Reply #46 on: January 08, 2022, 10:37:17 PM »
These all look like normal complex addressing mode. different opcodes for each but pretty standard mnemonics. I would be surprised if any instruction like LEA is different from all before it as it would make that CPU no standard x86 or x64.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11880
  • Assembler is fun ;-)
    • MasmBasic
Re: Instruction Timing
« Reply #47 on: January 09, 2022, 12:23:03 AM »
1. LEA on Skylake has 3 versions [rcx+offset] [rcx+rax] [rcx*2+rax+offset] taking x,2x,3x cycles respectively

Timings are not very stable, as it's a very tight loop:

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

2       cycles for 100 * lea eax, [edx+123]
0       cycles for 100 * lea eax, somestring[edx+ebx]
0       cycles for 100 * lea eax, somestring[4*edx+ebx]
7       cycles for 100 * lea eax, somestring[8*edx+ebx]

0       cycles for 100 * lea eax, [edx+123]
0       cycles for 100 * lea eax, somestring[edx+ebx]
0       cycles for 100 * lea eax, somestring[4*edx+ebx]
8       cycles for 100 * lea eax, somestring[8*edx+ebx]

1       cycles for 100 * lea eax, [edx+123]
3       cycles for 100 * lea eax, somestring[edx+ebx]
0       cycles for 100 * lea eax, somestring[4*edx+ebx]
8       cycles for 100 * lea eax, somestring[8*edx+ebx]

0       cycles for 100 * lea eax, [edx+123]
0       cycles for 100 * lea eax, somestring[edx+ebx]
2       cycles for 100 * lea eax, somestring[4*edx+ebx]
10      cycles for 100 * lea eax, somestring[8*edx+ebx]

Size is 7 bytes for all, while lea eax, [edx+1234] would be 10 bytes.

TimoVJL

  • Member
  • ****
  • Posts: 878
Re: Instruction Timing
« Reply #48 on: January 09, 2022, 05:22:45 AM »
Code: [Select]
AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

5       cycles for 100 * lea eax, [edx+123]
43      cycles for 100 * lea eax, somestring[edx+ebx]
47      cycles for 100 * lea eax, somestring[4*edx+ebx]
47      cycles for 100 * lea eax, somestring[8*edx+ebx]

7       cycles for 100 * lea eax, [edx+123]
35      cycles for 100 * lea eax, somestring[edx+ebx]
57      cycles for 100 * lea eax, somestring[4*edx+ebx]
49      cycles for 100 * lea eax, somestring[8*edx+ebx]

9       cycles for 100 * lea eax, [edx+123]
35      cycles for 100 * lea eax, somestring[edx+ebx]
71      cycles for 100 * lea eax, somestring[4*edx+ebx]
32      cycles for 100 * lea eax, somestring[8*edx+ebx]

6       cycles for 100 * lea eax, [edx+123]
46      cycles for 100 * lea eax, somestring[edx+ebx]
57      cycles for 100 * lea eax, somestring[4*edx+ebx]
43      cycles for 100 * lea eax, somestring[8*edx+ebx]

7       bytes for lea eax, [edx+123]
7       bytes for lea eax, somestring[edx+ebx]
7       bytes for lea eax, somestring[4*edx+ebx]
7       bytes for lea eax, somestring[8*edx+ebx]
May the source be with you

FORTRANS

  • Member
  • *****
  • Posts: 1136
Re: Instruction Timing
« Reply #49 on: January 09, 2022, 09:15:08 AM »
Hi,

   Two laptops.

Code: [Select]
Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

0 cycles for 100 * lea eax, [edx+123]
?? cycles for 100 * lea eax, somestring[edx+ebx]
0 cycles for 100 * lea eax, somestring[4*edx+ebx]
?? cycles for 100 * lea eax, somestring[8*edx+ebx]

0 cycles for 100 * lea eax, [edx+123]
?? cycles for 100 * lea eax, somestring[edx+ebx]
0 cycles for 100 * lea eax, somestring[4*edx+ebx]
?? cycles for 100 * lea eax, somestring[8*edx+ebx]

0 cycles for 100 * lea eax, [edx+123]
?? cycles for 100 * lea eax, somestring[edx+ebx]
0 cycles for 100 * lea eax, somestring[4*edx+ebx]
?? cycles for 100 * lea eax, somestring[8*edx+ebx]

0 cycles for 100 * lea eax, [edx+123]
?? cycles for 100 * lea eax, somestring[edx+ebx]
0 cycles for 100 * lea eax, somestring[4*edx+ebx]
?? cycles for 100 * lea eax, somestring[8*edx+ebx]

7 bytes for lea eax, [edx+123]
7 bytes for lea eax, somestring[edx+ebx]
7 bytes for lea eax, somestring[4*edx+ebx]
7 bytes for lea eax, somestring[8*edx+ebx]


--- ok ---

Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz (SSE4)

14 cycles for 100 * lea eax, [edx+123]
24 cycles for 100 * lea eax, somestring[edx+ebx]
20 cycles for 100 * lea eax, somestring[4*edx+ebx]
30 cycles for 100 * lea eax, somestring[8*edx+ebx]

32 cycles for 100 * lea eax, [edx+123]
2 cycles for 100 * lea eax, somestring[edx+ebx]
23 cycles for 100 * lea eax, somestring[4*edx+ebx]
?? cycles for 100 * lea eax, somestring[8*edx+ebx]

?? cycles for 100 * lea eax, [edx+123]
?? cycles for 100 * lea eax, somestring[edx+ebx]
6 cycles for 100 * lea eax, somestring[4*edx+ebx]
?? cycles for 100 * lea eax, somestring[8*edx+ebx]

3 cycles for 100 * lea eax, [edx+123]
?? cycles for 100 * lea eax, somestring[edx+ebx]
2 cycles for 100 * lea eax, somestring[4*edx+ebx]
?? cycles for 100 * lea eax, somestring[8*edx+ebx]

7 bytes for lea eax, [edx+123]
7 bytes for lea eax, somestring[edx+ebx]
7 bytes for lea eax, somestring[4*edx+ebx]
7 bytes for lea eax, somestring[8*edx+ebx]


--- ok ---

Regards,

Steve

LiaoMi

  • Member
  • ****
  • Posts: 991
Re: Instruction Timing
« Reply #50 on: January 10, 2022, 01:28:52 AM »
Hi,

Code: [Select]
11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz (SSE4)

4       cycles for 100 * lea eax, [edx+123]
10      cycles for 100 * lea eax, somestring[edx+ebx]
25      cycles for 100 * lea eax, somestring[4*edx+ebx]
26      cycles for 100 * lea eax, somestring[8*edx+ebx]

4       cycles for 100 * lea eax, [edx+123]
8       cycles for 100 * lea eax, somestring[edx+ebx]
27      cycles for 100 * lea eax, somestring[4*edx+ebx]
27      cycles for 100 * lea eax, somestring[8*edx+ebx]

6       cycles for 100 * lea eax, [edx+123]
10      cycles for 100 * lea eax, somestring[edx+ebx]
30      cycles for 100 * lea eax, somestring[4*edx+ebx]
29      cycles for 100 * lea eax, somestring[8*edx+ebx]

7       cycles for 100 * lea eax, [edx+123]
12      cycles for 100 * lea eax, somestring[edx+ebx]
31      cycles for 100 * lea eax, somestring[4*edx+ebx]
32      cycles for 100 * lea eax, somestring[8*edx+ebx]

7       bytes for lea eax, [edx+123]
7       bytes for lea eax, somestring[edx+ebx]
7       bytes for lea eax, somestring[4*edx+ebx]
7       bytes for lea eax, somestring[8*edx+ebx]


--- ok ---

guga

  • Member
  • *****
  • Posts: 1372
  • Assembly is a state of art.
    • RosAsm
Re: Instruction Timing
« Reply #51 on: January 14, 2022, 04:52:37 PM »
AMD Ryzen 5 2400G with Radeon Vega Graphics     (SSE4)

??      cycles for 100 * lea eax, [edx+123]
37      cycles for 100 * lea eax, somestring[edx+ebx]
76      cycles for 100 * lea eax, somestring[4*edx+ebx]
42      cycles for 100 * lea eax, somestring[8*edx+ebx]

0       cycles for 100 * lea eax, [edx+123]
32      cycles for 100 * lea eax, somestring[edx+ebx]
74      cycles for 100 * lea eax, somestring[4*edx+ebx]
40      cycles for 100 * lea eax, somestring[8*edx+ebx]

2       cycles for 100 * lea eax, [edx+123]
35      cycles for 100 * lea eax, somestring[edx+ebx]
78      cycles for 100 * lea eax, somestring[4*edx+ebx]
41      cycles for 100 * lea eax, somestring[8*edx+ebx]

0       cycles for 100 * lea eax, [edx+123]
33      cycles for 100 * lea eax, somestring[edx+ebx]
68      cycles for 100 * lea eax, somestring[4*edx+ebx]
35      cycles for 100 * lea eax, somestring[8*edx+ebx]

7       bytes for lea eax, [edx+123]
7       bytes for lea eax, somestring[edx+ebx]
7       bytes for lea eax, somestring[4*edx+ebx]
7       bytes for lea eax, somestring[8*edx+ebx]


--- ok ---
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1372
  • Assembly is a state of art.
    • RosAsm
Re: Instruction Timing
« Reply #52 on: January 14, 2022, 04:59:48 PM »
Code: [Select]
AMD Ryzen 5 2400G with Radeon Vega Graphics     (AVX2)
------------------------------------------------
Instr.     Operands         Bytes  Clocks
------------------------------------------------
adc      reg64,reg64          3       3
adc      reg64,mem128         6       2
adc      reg64,imm8           4       2
adc      mem128,reg64         6       4
adc      mem128,imm8          7       4
add      reg64,reg64          3       3
add      reg64,mem128         6       2
add      reg64,imm8           4       1
add      mem128,reg64         6       4
add      mem128,imm8          7       5
and      reg64,reg64          3       1
and      reg64,mem128         6       2
and      reg64,imm8           4       2
and      mem128,reg64         6       4
and      mem128,imm8          7       4
bsf      reg64,reg64          4      12
bsf      reg64,mem128         7      16
bsr      reg64,reg64          4      16
bsr      reg64,mem128         7      22
bswap    reg32                2       2
bswap    reg64                3       2
bt       reg64,reg64          4       1
bt       reg64,imm8           5       1
bt       mem16,reg16          6      12
bt       mem16,imm8           6       2
btc      reg64,reg64          4       3
btc      reg64,imm8           5       3
btc      mem16,imm8           6       8
btr      reg64,reg64          4       3
btr      reg64,imm8           5       3
btr      mem16,imm8           6       8
bts      reg64,reg64          4       3
bts      reg64,imm8           5       3
bts      mem16,imm8           6       8
call     reg64                2      16
cbw                           2       4
cdq                           1       1
clc                           1       2
cld                           1      12
cmp      reg64,reg64          3       1
cmp      reg64,imm8           4       1
cmp      mem128,reg64         6       2
cmp      mem128,imm8          7       2
cmpsb                         1      12
cmpsw                         2      12
cmpsd                         1      12
cmpxchg  reg64,reg64          4      12
cmpxchg  mem128,reg64         7      12
cwd                           2       3
cwde                          1       4
dec      reg8                 2       1
dec      reg64                3       1
dec      mem8                 2       4
dec      mem128               6       4
div      reg64                8      56
enter    imm8,imm8            4      60
idiv     reg32                7      57
imul     reg8                 2      11
imul     reg16                3      13
imul     reg32                2      12
imul     reg64                3      12
imul     mem8                 2      12
imul     mem16                4      12
imul     mem32                4      11
imul     mem128               6      11
imul     reg16,reg16          4       5
imul     reg32,reg32          3       5
imul     reg64,reg64          4       5
imul     reg16,reg16,imm8     4       7
imul     reg32,reg32,imm8     3       4
imul     reg64,reg64,imm8     4       4
inc      reg8                 2       1
inc      reg64                3       1
inc      mem8                 2       4
inc      mem128               6       4
lahf                          1       8
lar      reg16,reg16          4     320
lar      reg32,reg32          3     303
lea      reg64,mem128         6       1
lodsb                         1      12
lodsw                         2      12
lodsd                         1      12
mov      reg64,reg64          3       1
mov      reg64,mem128         6       2
mov      reg64,imm8           7       1
mov      mem128,reg64         6       4
mov      mem128,imm8         10       4
movsb                         1      12
movsw                         2      12
movsd                         1      12
movsx    reg32,reg8           3       1
movsx    reg32,mem8           4       2
movsx    reg64,reg16          4       1
movsx    reg64,mem16          5       2
movzx    reg32,reg8           3       1
movzx    reg32,mem8           4       2
movzx    reg64,reg16          4       1
movzx    reg64,mem16          5       2
mul      reg8                 2      11
mul      reg16                3      13
mul      reg32                2      12
mul      reg64                3      12
mul      mem8                 2      11
mul      mem16                4      13
mul      mem32                4      12
mul      mem128               6      12
neg      reg8                 2       1
neg      reg64                3       2
neg      mem8                 2       4
neg      mem128               6       4
nop                           1       1
not      reg8                 2       1
not      reg64                3       1
not      mem32                4       4
not      mem128               6       5
or       reg8,reg8            2       1
or       reg64,reg64          3       1
or       reg64,mem128         6       2
or       reg64,imm8           4       1
or       mem8,reg8            3       4
or       mem128,reg64         6       4
or       mem128,imm8          7       4
pop      reg64                1       2
popfq                         4      55
push     reg64                1       4
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com