I implemented this piece of code and I was wondering if there isn't a quicker/ better way to do this:
;variables used in code below
;x2 dd ?
;y2 dd ?
mov eax,lParam ;Get coords from lParam and store in x2 and y2
and eax,0FFFFh ;retrieve LO word
mov x2,eax
mov eax,lParam
and eax,0FFFF0000h ;retrieve HO word
ror eax,16 ;put HO word as LO word
mov y2,eax
Thanks in advance,
Jannes
; DWORD in EAX
mov CX, AX
rol EAX, 16
mov DX, AX
low word = WORD ptr lParam
high word = WORD ptr lParam[2]
thanks , but do those work ? since I need to have the low word and high word stored in a dword and not just in a word like your codes are implying ?
use movzx or movsx according to the type.
e.g.:
movsx eax,WORD ptr lParam
movsx edx,WORD ptr lParam[2]
mov x,eax
mov y,edx
Thank you :exclaim:
Hi gelatine1,
both solutions (Hutch and qWord) should work fine. It could be that Hutch's code gets a few penalty cycles inside a 32-bit segment. But that's all.
Gunther
You got it in the form of 16 bit because that is what you asked for, the HI and LO WORDS of a dword.
What's wrong with the classic:
mov eax,lParam
mov edx,eax
shr eax,16 ;high word
and edx,0FFFFh ;low word
mov y2,eax
mov x2,edx
undoubtedly faster, too
These are my 2, I would normally use the first.
LOCAL tx :DWORD
LOCAL ty :DWORD
; ------------------------------------- 1
movsx eax, WORD PTR [lParam]
movsx ecx, WORD PTR [lParam+2]
mov tx, eax
mov ty, ecx
print uhex$(tx),"h loword",13,10
print uhex$(ty),"h hiword",13,10
; ------------------------------------- 2
mov edx, lParam
movzx eax, dx
rol edx, 16
movzx ecx, dx
mov tx, eax
mov ty, ecx
print uhex$(tx),"h loword",13,10
print uhex$(ty),"h hiword",13,10
; -------------------------------------
Quote from: hutch-- on November 12, 2014, 08:09:38 AM
These are my 2, I would normally use the first.
yes, me to. It's a fast and clear solution.
Gunther
Speed-wise I can't see much of a difference. Anyway, in a WndProc it won't matter if it's 2 cycles or 1.8 ( ::) )
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
997 cycles for 500 * movzx
879 cycles for 500 * shr & and FFFFh
899 cycles for 500 * movsx
994 cycles for 500 * movzx
874 cycles for 500 * shr & and FFFFh
897 cycles for 500 * movsx
970 cycles for 500 * movzx
871 cycles for 500 * shr & and FFFFh
898 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
a bit disappointed with the movzx/movsx speed
I wonder how much of that may be attributed to the unaligned word
my sister's machine: (I'm in Michigan this week)
Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz (SSE4)
2035 cycles for 500 * movzx
1042 cycles for 500 * shr & and FFFFh
2042 cycles for 500 * movsx
2040 cycles for 500 * movzx
1042 cycles for 500 * shr & and FFFFh
2115 cycles for 500 * movsx
2034 cycles for 500 * movzx
1052 cycles for 500 * shr & and FFFFh
2041 cycles for 500 * movsx
Dave,
You can expect those types of variations on different hardware. My old Core2 quad behave much closer to the i7 I am using but some of the earlier Core2 chips were internally different.
These are the results on my i7.
1442 cycles for 500 * movzx
1453 cycles for 500 * shr & and FFFFh
1450 cycles for 500 * movsx
1443 cycles for 500 * movzx
1453 cycles for 500 * shr & and FFFFh
1451 cycles for 500 * movsx
1443 cycles for 500 * movzx
1453 cycles for 500 * shr & and FFFFh
1451 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
This is the result on an older Core2 quad.
Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (SSE4)
2098 cycles for 500 * movzx
1030 cycles for 500 * shr & and FFFFh
2128 cycles for 500 * movsx
2097 cycles for 500 * movzx
1046 cycles for 500 * shr & and FFFFh
2025 cycles for 500 * movsx
2108 cycles for 500 * movzx
1028 cycles for 500 * shr & and FFFFh
2020 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
This is the later quad.
Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz (SSE4)
2103 cycles for 500 * movzx
1030 cycles for 500 * shr & and FFFFh
2015 cycles for 500 * movsx
2100 cycles for 500 * movzx
1042 cycles for 500 * shr & and FFFFh
2009 cycles for 500 * movsx
2100 cycles for 500 * movzx
1028 cycles for 500 * shr & and FFFFh
2013 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
While I am at it, this is my old PIV.
Genuine Intel(R) CPU 3.80GHz (SSE3)
1861 cycles for 500 * movzx
2103 cycles for 500 * shr & and FFFFh
1777 cycles for 500 * movsx
1823 cycles for 500 * movzx
2058 cycles for 500 * shr & and FFFFh
1864 cycles for 500 * movsx
1814 cycles for 500 * movzx
2140 cycles for 500 * shr & and FFFFh
2091 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
Without comments
Quote
Intel(R) Celeron(R) CPU 2.80GHz (SSE3)
1771 cycles for 500 * movzx
2087 cycles for 500 * shr & and FFFFh
1792 cycles for 500 * movsx
1804 cycles for 500 * movzx
2083 cycles for 500 * shr & and FFFFh
1796 cycles for 500 * movsx
1772 cycles for 500 * movzx
2085 cycles for 500 * shr & and FFFFh
1792 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
With some humour, on my i7 the following 4 algos clock almost identically.
; ------------------------------------- 1
movzx eax, WORD PTR [lParam]
movzx ecx, WORD PTR [lParam+2]
mov tx, eax
mov ty, ecx
print uhex$(tx),"h loword",13,10
print uhex$(ty),"h hiword",13,10
; ------------------------------------- 2
mov edx, lParam
movzx eax, dx
rol edx, 16
movzx ecx, dx
mov tx, eax
mov ty, ecx
print uhex$(tx),"h loword",13,10
print uhex$(ty),"h hiword",13,10
; ------------------------------------- 3
mov edx, lParam
mov ecx, edx
and edx, 0000FFFFh
rol ecx, 16
and ecx, 0000FFFFh
mov tx, edx
mov ty, ecx
print uhex$(tx),"h loword",13,10
print uhex$(ty),"h hiword",13,10
; ------------------------------------- 4
mov edx, lParam
and edx, 0000FFFFh
movzx ecx, WORD PTR [lParam+2]
mov tx, edx
mov ty, ecx
print uhex$(tx),"h loword",13,10
print uhex$(ty),"h hiword",13,10
; -------------------------------------
pre-P4 (SSE1)
2027 cycles for 500 * movzx
2532 cycles for 500 * shr & and FFFFh
2043 cycles for 500 * movsx
2028 cycles for 500 * movzx
2534 cycles for 500 * shr & and FFFFh
2027 cycles for 500 * movsx
2026 cycles for 500 * movzx
2535 cycles for 500 * shr & and FFFFh
2026 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
--- ok --- pre-P4
4164 cycles for 500 * movzx
2577 cycles for 500 * shr & and FFFFh
4181 cycles for 500 * movsx
4138 cycles for 500 * movzx
2655 cycles for 500 * shr & and FFFFh
4188 cycles for 500 * movsx
4146 cycles for 500 * movzx
2605 cycles for 500 * shr & and FFFFh
4169 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
--- ok --- Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
1530 cycles for 500 * movzx
2035 cycles for 500 * shr & and FFFFh
1527 cycles for 500 * movsx
1533 cycles for 500 * movzx
2036 cycles for 500 * shr & and FFFFh
1529 cycles for 500 * movsx
1529 cycles for 500 * movzx
2039 cycles for 500 * shr & and FFFFh
1535 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
--- ok --- Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (SSE4)
1004 cycles for 500 * movzx
1378 cycles for 500 * shr & and FFFFh
1013 cycles for 500 * movsx
1003 cycles for 500 * movzx
1379 cycles for 500 * shr & and FFFFh
1004 cycles for 500 * movsx
1020 cycles for 500 * movzx
1236 cycles for 500 * shr & and FFFFh
1363 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
--- ok ---
Jochen,
the results:
Quote
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
898 cycles for 500 * movzx
919 cycles for 500 * shr & and FFFFh
918 cycles for 500 * movsx
901 cycles for 500 * movzx
917 cycles for 500 * shr & and FFFFh
918 cycles for 500 * movsx
949 cycles for 500 * movzx
931 cycles for 500 * shr & and FFFFh
924 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
--- ok ---
Gunther
Core2-i3 G3220 under Windows7-64:
Intel(R) Pentium(R) CPU G3220 @ 3.00GHz (SSE4)
1001 cycles for 500 * movzx
1017 cycles for 500 * shr & and FFFFh
1005 cycles for 500 * movsx
1002 cycles for 500 * movzx
1018 cycles for 500 * shr & and FFFFh
1001 cycles for 500 * movsx
1006 cycles for 500 * movzx
1017 cycles for 500 * shr & and FFFFh
1002 cycles for 500 * movsx
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4) Windows 8.1 64 bit
1029 cycles for 500 * movzx
1057 cycles for 500 * shr & and FFFFh
1059 cycles for 500 * movsx
1039 cycles for 500 * movzx
1058 cycles for 500 * shr & and FFFFh
1059 cycles for 500 * movsx
1038 cycles for 500 * movzx
1056 cycles for 500 * shr & and FFFFh
1059 cycles for 500 * movsx
17 bytes for movzx
23 bytes for shr & and FFFFh
17 bytes for movsx
.if OldIntelQuadOrSimilar
shr ...
.else
movsx ...
.endif
One cycle more for the extra jump, of course - shall I post timings?
;)
deleted
I like that nidud :t
Jochen,
Quote from: jj2007 on November 13, 2014, 12:11:14 PM
.if OldIntelQuadOrSimilar
shr ...
.else
movsx ...
.endif
One cycle more for the extra jump, of course - shall I post timings?
;)
yes.
Gunther