News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Mul instruction

Started by gelatine1, May 01, 2014, 04:36:56 PM

Previous topic - Next topic

FORTRANS

Hi,

   Using the median of the timing results (rounded); AMD Athlon 6.5%,
P-III 1.8%, P-MMX 7.8%, and Pentium M -1.7%.  Cute results, thank
you for the benchmark.  Doesn't make the 15% rule in any event.



pre-P4 (SSE1)

1698 cycles for 100 * block A
1728 cycles for 100 * block B

1711 cycles for 100 * block A
1732 cycles for 100 * block B

1701 cycles for 100 * block A
1733 cycles for 100 * block B

111 bytes for block A
101 bytes for block B


--- ok --- pre-P4
6471 cycles for 100 * block A
6974 cycles for 100 * block B

6481 cycles for 100 * block A
6978 cycles for 100 * block B

6472 cycles for 100 * block A
6979 cycles for 100 * block B

111 bytes for block A
101 bytes for block B


--- ok --- Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

1360 cycles for 100 * block A
1335 cycles for 100 * block B

1358 cycles for 100 * block A
1331 cycles for 100 * block B

1356 cycles for 100 * block A
1337 cycles for 100 * block B

111 bytes for block A
101 bytes for block B


--- ok ---


Regards,

Steve N.

Gunther

Jochen,


Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

896     cycles for 100 * block A
896     cycles for 100 * block B

895     cycles for 100 * block A
895     cycles for 100 * block B

895     cycles for 100 * block A
894     cycles for 100 * block B

111     bytes for block A
101     bytes for block B

--- ok ---


Gunther
You have to know the facts before you can distort them.

jj2007

Thanks - and here my old machine:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

1349    cycles for 100 * block A
1329    cycles for 100 * block B

1349    cycles for 100 * block A
1328    cycles for 100 * block B

1352    cycles for 100 * block A
1331    cycles for 100 * block B

111     bytes for block A
101     bytes for block B

gelatine1

Why would someone use the Imul that only stores into eax ? it only allows to multiply two numbers a and b where log(a)+log(b) is smaller than 32 whereas the mul instruction increases that bound to 64 ?

dedndave

it handles signed integers
it allows immediate operands
it allows multiplication of registers other than EAX (for MUL, one operand must always be in EAX)
it's probably a little faster, depending on the CPU

IMUL is often handy for calculating addresses that LEA won't handle (array index, etc)
you use the one that's most appropriate for what you want to do