Mul instruction

FORTRANS · May 05, 2014, 10:36:50 PM

Hi,

Using the median of the timing results (rounded); AMD Athlon 6.5%,
P-III 1.8%, P-MMX 7.8%, and Pentium M -1.7%. Cute results, thank
you for the benchmark. Doesn't make the 15% rule in any event.

Code Select


pre-P4 (SSE1)

1698	cycles for 100 * block A
1728	cycles for 100 * block B

1711	cycles for 100 * block A
1732	cycles for 100 * block B

1701	cycles for 100 * block A
1733	cycles for 100 * block B

111	bytes for block A
101	bytes for block B


--- ok --- pre-P4
6471	cycles for 100 * block A
6974	cycles for 100 * block B

6481	cycles for 100 * block A
6978	cycles for 100 * block B

6472	cycles for 100 * block A
6979	cycles for 100 * block B

111	bytes for block A
101	bytes for block B


--- ok --- Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

1360	cycles for 100 * block A
1335	cycles for 100 * block B

1358	cycles for 100 * block A
1331	cycles for 100 * block B

1356	cycles for 100 * block A
1337	cycles for 100 * block B

111	bytes for block A
101	bytes for block B


--- ok ---

Regards,

Steve N.

Gunther · May 06, 2014, 01:56:58 AM

Jochen,

Code Select


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

896     cycles for 100 * block A
896     cycles for 100 * block B

895     cycles for 100 * block A
895     cycles for 100 * block B

895     cycles for 100 * block A
894     cycles for 100 * block B

111     bytes for block A
101     bytes for block B

--- ok ---

Gunther

jj2007 · May 06, 2014, 03:03:03 AM

Thanks - and here my old machine:

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)

1349 cycles for 100 * block A
1329 cycles for 100 * block B

1349 cycles for 100 * block A
1328 cycles for 100 * block B

1352 cycles for 100 * block A
1331 cycles for 100 * block B

111 bytes for block A
101 bytes for block B

gelatine1 · May 07, 2014, 04:55:04 AM

Why would someone use the Imul that only stores into eax ? it only allows to multiply two numbers a and b where log(a)+log(b) is smaller than 32 whereas the mul instruction increases that bound to 64 ?

dedndave · May 07, 2014, 05:03:19 AM

it handles signed integers
it allows immediate operands
it allows multiplication of registers other than EAX (for MUL, one operand must always be in EAX)
it's probably a little faster, depending on the CPU

IMUL is often handy for calculating addresses that LEA won't handle (array index, etc)
you use the one that's most appropriate for what you want to do

The MASM Forum

News:

Mul instruction

FORTRANS

Gunther

jj2007

gelatine1

dedndave