News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

quickest way to retrieve HO and LO words of a dword ?

Started by gelatine1, November 12, 2014, 01:11:50 AM

Previous topic - Next topic

hutch--

Dave,

You can expect those types of variations on different hardware. My old Core2 quad behave much closer to the i7 I am using but some of the earlier Core2 chips were internally different.

These are the results on my i7.


1442    cycles for 500 * movzx
1453    cycles for 500 * shr & and FFFFh
1450    cycles for 500 * movsx

1443    cycles for 500 * movzx
1453    cycles for 500 * shr & and FFFFh
1451    cycles for 500 * movsx

1443    cycles for 500 * movzx
1453    cycles for 500 * shr & and FFFFh
1451    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx


This is the result on an older Core2 quad.


Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)

2098    cycles for 500 * movzx
1030    cycles for 500 * shr & and FFFFh
2128    cycles for 500 * movsx

2097    cycles for 500 * movzx
1046    cycles for 500 * shr & and FFFFh
2025    cycles for 500 * movsx

2108    cycles for 500 * movzx
1028    cycles for 500 * shr & and FFFFh
2020    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx


This is the later quad.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)

2103    cycles for 500 * movzx
1030    cycles for 500 * shr & and FFFFh
2015    cycles for 500 * movsx

2100    cycles for 500 * movzx
1042    cycles for 500 * shr & and FFFFh
2009    cycles for 500 * movsx

2100    cycles for 500 * movzx
1028    cycles for 500 * shr & and FFFFh
2013    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx


While I am at it, this is my old PIV.


Genuine Intel(R) CPU 3.80GHz (SSE3)

1861    cycles for 500 * movzx
2103    cycles for 500 * shr & and FFFFh
1777    cycles for 500 * movsx

1823    cycles for 500 * movzx
2058    cycles for 500 * shr & and FFFFh
1864    cycles for 500 * movsx

1814    cycles for 500 * movzx
2140    cycles for 500 * shr & and FFFFh
2091    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx




TouEnMasm

Without comments
Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)

1771    cycles for 500 * movzx
2087    cycles for 500 * shr & and FFFFh
1792    cycles for 500 * movsx

1804    cycles for 500 * movzx
2083    cycles for 500 * shr & and FFFFh
1796    cycles for 500 * movsx

1772    cycles for 500 * movzx
2085    cycles for 500 * shr & and FFFFh
1792    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx
Fa is a musical note to play with CL

hutch--

With some humour, on my i7 the following 4 algos clock almost identically.


  ; ------------------------------------- 1

    movzx eax, WORD PTR [lParam]
    movzx ecx, WORD PTR [lParam+2]

    mov tx, eax
    mov ty, ecx
    print uhex$(tx),"h  loword",13,10
    print uhex$(ty),"h  hiword",13,10

  ; ------------------------------------- 2

    mov edx, lParam

    movzx eax, dx
    rol edx, 16
    movzx ecx, dx

    mov tx, eax
    mov ty, ecx
    print uhex$(tx),"h  loword",13,10
    print uhex$(ty),"h  hiword",13,10

  ; ------------------------------------- 3

    mov edx, lParam

    mov ecx, edx
    and edx, 0000FFFFh
    rol ecx, 16
    and ecx, 0000FFFFh

    mov tx, edx
    mov ty, ecx
    print uhex$(tx),"h  loword",13,10
    print uhex$(ty),"h  hiword",13,10

  ; ------------------------------------- 4

    mov edx, lParam

    and edx, 0000FFFFh
    movzx ecx, WORD PTR [lParam+2]

    mov tx, edx
    mov ty, ecx
    print uhex$(tx),"h  loword",13,10
    print uhex$(ty),"h  hiword",13,10

  ; -------------------------------------

FORTRANS

pre-P4 (SSE1)

2027   cycles for 500 * movzx
2532   cycles for 500 * shr & and FFFFh
2043   cycles for 500 * movsx

2028   cycles for 500 * movzx
2534   cycles for 500 * shr & and FFFFh
2027   cycles for 500 * movsx

2026   cycles for 500 * movzx
2535   cycles for 500 * shr & and FFFFh
2026   cycles for 500 * movsx

17   bytes for movzx
23   bytes for shr & and FFFFh
17   bytes for movsx


--- ok --- pre-P4
4164   cycles for 500 * movzx
2577   cycles for 500 * shr & and FFFFh
4181   cycles for 500 * movsx

4138   cycles for 500 * movzx
2655   cycles for 500 * shr & and FFFFh
4188   cycles for 500 * movsx

4146   cycles for 500 * movzx
2605   cycles for 500 * shr & and FFFFh
4169   cycles for 500 * movsx

17   bytes for movzx
23   bytes for shr & and FFFFh
17   bytes for movsx


--- ok --- Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

1530   cycles for 500 * movzx
2035   cycles for 500 * shr & and FFFFh
1527   cycles for 500 * movsx

1533   cycles for 500 * movzx
2036   cycles for 500 * shr & and FFFFh
1529   cycles for 500 * movsx

1529   cycles for 500 * movzx
2039   cycles for 500 * shr & and FFFFh
1535   cycles for 500 * movsx

17   bytes for movzx
23   bytes for shr & and FFFFh
17   bytes for movsx


--- ok --- Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

1004   cycles for 500 * movzx
1378   cycles for 500 * shr & and FFFFh
1013   cycles for 500 * movsx

1003   cycles for 500 * movzx
1379   cycles for 500 * shr & and FFFFh
1004   cycles for 500 * movsx

1020   cycles for 500 * movzx
1236   cycles for 500 * shr & and FFFFh
1363   cycles for 500 * movsx

17   bytes for movzx
23   bytes for shr & and FFFFh
17   bytes for movsx


--- ok ---

Gunther

Jochen,

the results:
Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

898     cycles for 500 * movzx
919     cycles for 500 * shr & and FFFFh
918     cycles for 500 * movsx

901     cycles for 500 * movzx
917     cycles for 500 * shr & and FFFFh
918     cycles for 500 * movsx

949     cycles for 500 * movzx
931     cycles for 500 * shr & and FFFFh
924     cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx

--- ok ---

Gunther
You have to know the facts before you can distort them.

MichaelW

Core2-i3 G3220 under Windows7-64:

Intel(R) Pentium(R) CPU G3220 @ 3.00GHz (SSE4)

1001    cycles for 500 * movzx
1017    cycles for 500 * shr & and FFFFh
1005    cycles for 500 * movsx

1002    cycles for 500 * movzx
1018    cycles for 500 * shr & and FFFFh
1001    cycles for 500 * movsx

1006    cycles for 500 * movzx
1017    cycles for 500 * shr & and FFFFh
1002    cycles for 500 * movsx
Well Microsoft, here's another nice mess you've gotten us into.

Siekmanski

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4) Windows 8.1 64 bit

1029    cycles for 500 * movzx
1057    cycles for 500 * shr & and FFFFh
1059    cycles for 500 * movsx

1039    cycles for 500 * movzx
1058    cycles for 500 * shr & and FFFFh
1059    cycles for 500 * movsx

1038    cycles for 500 * movzx
1056    cycles for 500 * shr & and FFFFh
1059    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx
Creative coders use backward thinking techniques as a strategy.

jj2007

.if OldIntelQuadOrSimilar
   shr ...
.else
   movsx ...
.endif

One cycle more for the extra jump, of course - shall I post timings?
;)

nidud

#23
deleted

dedndave


Gunther

Jochen,

Quote from: jj2007 on November 13, 2014, 12:11:14 PM
.if OldIntelQuadOrSimilar
   shr ...
.else
   movsx ...
.endif

One cycle more for the extra jump, of course - shall I post timings?
;)

yes.

Gunther
You have to know the facts before you can distort them.