Author Topic: quickest way to retrieve HO and LO words of a dword ?  (Read 12357 times)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10316
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #15 on: November 13, 2014, 12:22:38 AM »
Dave,

You can expect those types of variations on different hardware. My old Core2 quad behave much closer to the i7 I am using but some of the earlier Core2 chips were internally different.

These are the results on my i7.

Code: [Select]
1442    cycles for 500 * movzx
1453    cycles for 500 * shr & and FFFFh
1450    cycles for 500 * movsx

1443    cycles for 500 * movzx
1453    cycles for 500 * shr & and FFFFh
1451    cycles for 500 * movsx

1443    cycles for 500 * movzx
1453    cycles for 500 * shr & and FFFFh
1451    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx

This is the result on an older Core2 quad.

Code: [Select]
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (SSE4)

2098    cycles for 500 * movzx
1030    cycles for 500 * shr & and FFFFh
2128    cycles for 500 * movsx

2097    cycles for 500 * movzx
1046    cycles for 500 * shr & and FFFFh
2025    cycles for 500 * movsx

2108    cycles for 500 * movzx
1028    cycles for 500 * shr & and FFFFh
2020    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx

This is the later quad.

Code: [Select]
Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)

2103    cycles for 500 * movzx
1030    cycles for 500 * shr & and FFFFh
2015    cycles for 500 * movsx

2100    cycles for 500 * movzx
1042    cycles for 500 * shr & and FFFFh
2009    cycles for 500 * movsx

2100    cycles for 500 * movzx
1028    cycles for 500 * shr & and FFFFh
2013    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx

While I am at it, this is my old PIV.

Code: [Select]
Genuine Intel(R) CPU 3.80GHz (SSE3)

1861    cycles for 500 * movzx
2103    cycles for 500 * shr & and FFFFh
1777    cycles for 500 * movsx

1823    cycles for 500 * movzx
2058    cycles for 500 * shr & and FFFFh
1864    cycles for 500 * movsx

1814    cycles for 500 * movzx
2140    cycles for 500 * shr & and FFFFh
2091    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx


hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

TouEnMasm

  • Member
  • *****
  • Posts: 1764
    • EditMasm
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #16 on: November 13, 2014, 12:32:45 AM »
Without comments
Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)

1771    cycles for 500 * movzx
2087    cycles for 500 * shr & and FFFFh
1792    cycles for 500 * movsx

1804    cycles for 500 * movzx
2083    cycles for 500 * shr & and FFFFh
1796    cycles for 500 * movsx

1772    cycles for 500 * movzx
2085    cycles for 500 * shr & and FFFFh
1792    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx
Fa is a musical note to play with CL

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10316
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #17 on: November 13, 2014, 12:59:30 AM »
With some humour, on my i7 the following 4 algos clock almost identically.

Code: [Select]
  ; ------------------------------------- 1

    movzx eax, WORD PTR [lParam]
    movzx ecx, WORD PTR [lParam+2]

    mov tx, eax
    mov ty, ecx
    print uhex$(tx),"h  loword",13,10
    print uhex$(ty),"h  hiword",13,10

  ; ------------------------------------- 2

    mov edx, lParam

    movzx eax, dx
    rol edx, 16
    movzx ecx, dx

    mov tx, eax
    mov ty, ecx
    print uhex$(tx),"h  loword",13,10
    print uhex$(ty),"h  hiword",13,10

  ; ------------------------------------- 3

    mov edx, lParam

    mov ecx, edx
    and edx, 0000FFFFh
    rol ecx, 16
    and ecx, 0000FFFFh

    mov tx, edx
    mov ty, ecx
    print uhex$(tx),"h  loword",13,10
    print uhex$(ty),"h  hiword",13,10

  ; ------------------------------------- 4

    mov edx, lParam

    and edx, 0000FFFFh
    movzx ecx, WORD PTR [lParam+2]

    mov tx, edx
    mov ty, ecx
    print uhex$(tx),"h  loword",13,10
    print uhex$(ty),"h  hiword",13,10

  ; -------------------------------------
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

FORTRANS

  • Member
  • *****
  • Posts: 1227
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #18 on: November 13, 2014, 01:36:37 AM »
pre-P4 (SSE1)

2027   cycles for 500 * movzx
2532   cycles for 500 * shr & and FFFFh
2043   cycles for 500 * movsx

2028   cycles for 500 * movzx
2534   cycles for 500 * shr & and FFFFh
2027   cycles for 500 * movsx

2026   cycles for 500 * movzx
2535   cycles for 500 * shr & and FFFFh
2026   cycles for 500 * movsx

17   bytes for movzx
23   bytes for shr & and FFFFh
17   bytes for movsx


--- ok --- pre-P4
4164   cycles for 500 * movzx
2577   cycles for 500 * shr & and FFFFh
4181   cycles for 500 * movsx

4138   cycles for 500 * movzx
2655   cycles for 500 * shr & and FFFFh
4188   cycles for 500 * movsx

4146   cycles for 500 * movzx
2605   cycles for 500 * shr & and FFFFh
4169   cycles for 500 * movsx

17   bytes for movzx
23   bytes for shr & and FFFFh
17   bytes for movsx


--- ok --- Intel(R) Pentium(R) M processor 1.70GHz (SSE2)

1530   cycles for 500 * movzx
2035   cycles for 500 * shr & and FFFFh
1527   cycles for 500 * movsx

1533   cycles for 500 * movzx
2036   cycles for 500 * shr & and FFFFh
1529   cycles for 500 * movsx

1529   cycles for 500 * movzx
2039   cycles for 500 * shr & and FFFFh
1535   cycles for 500 * movsx

17   bytes for movzx
23   bytes for shr & and FFFFh
17   bytes for movsx


--- ok --- Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)

1004   cycles for 500 * movzx
1378   cycles for 500 * shr & and FFFFh
1013   cycles for 500 * movsx

1003   cycles for 500 * movzx
1379   cycles for 500 * shr & and FFFFh
1004   cycles for 500 * movsx

1020   cycles for 500 * movzx
1236   cycles for 500 * shr & and FFFFh
1363   cycles for 500 * movsx

17   bytes for movzx
23   bytes for shr & and FFFFh
17   bytes for movsx


--- ok ---

Gunther

  • Member
  • *****
  • Posts: 4159
  • Forgive your enemies, but never forget their names
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #19 on: November 13, 2014, 04:00:36 AM »
Jochen,

the results:
Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)

898     cycles for 500 * movzx
919     cycles for 500 * shr & and FFFFh
918     cycles for 500 * movsx

901     cycles for 500 * movzx
917     cycles for 500 * shr & and FFFFh
918     cycles for 500 * movsx

949     cycles for 500 * movzx
931     cycles for 500 * shr & and FFFFh
924     cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx

--- ok ---

Gunther
You have to know the facts before you can distort them.

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1196
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #20 on: November 13, 2014, 04:26:11 AM »
Core2-i3 G3220 under Windows7-64:

Intel(R) Pentium(R) CPU G3220 @ 3.00GHz (SSE4)

1001    cycles for 500 * movzx
1017    cycles for 500 * shr & and FFFFh
1005    cycles for 500 * movsx

1002    cycles for 500 * movzx
1018    cycles for 500 * shr & and FFFFh
1001    cycles for 500 * movsx

1006    cycles for 500 * movzx
1017    cycles for 500 * shr & and FFFFh
1002    cycles for 500 * movsx
Well Microsoft, here’s another nice mess you’ve gotten us into.

Siekmanski

  • Member
  • *****
  • Posts: 2622
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #21 on: November 13, 2014, 09:18:33 AM »
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4) Windows 8.1 64 bit

1029    cycles for 500 * movzx
1057    cycles for 500 * shr & and FFFFh
1059    cycles for 500 * movsx

1039    cycles for 500 * movzx
1058    cycles for 500 * shr & and FFFFh
1059    cycles for 500 * movsx

1038    cycles for 500 * movzx
1056    cycles for 500 * shr & and FFFFh
1059    cycles for 500 * movsx

17      bytes for movzx
23      bytes for shr & and FFFFh
17      bytes for movsx
Creative coders use backward thinking techniques as a strategy.

jj2007

  • Member
  • *****
  • Posts: 13661
  • Assembly is fun ;-)
    • MasmBasic
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #22 on: November 13, 2014, 12:11:14 PM »
.if OldIntelQuadOrSimilar
   shr ...
.else
   movsx ...
.endif

One cycle more for the extra jump, of course - shall I post timings?
 ;)

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #23 on: November 13, 2014, 09:50:19 PM »
deleted
« Last Edit: February 25, 2022, 09:01:16 AM by nidud »

dedndave

  • Member
  • *****
  • Posts: 8828
  • Still using Abacus 2.0
    • DednDave
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #24 on: November 13, 2014, 10:52:47 PM »
I like that nidud   :t

Gunther

  • Member
  • *****
  • Posts: 4159
  • Forgive your enemies, but never forget their names
Re: quickest way to retrieve HO and LO words of a dword ?
« Reply #25 on: November 14, 2014, 04:09:53 AM »
Jochen,

.if OldIntelQuadOrSimilar
   shr ...
.else
   movsx ...
.endif

One cycle more for the extra jump, of course - shall I post timings?
 ;)

yes.

Gunther
You have to know the facts before you can distort them.