For unknown reasons, Intel never bothered to give SIMD instructions a bitwise shift. So I am trying to do that with reg32. It works, but it looks clumsy and slow. Is there a more elegant way to do that? I thought of shrd but it doesn't do the job :(
include \masm32\include\masm32rt.inc
.code
start:
mov edi, 80000000h
xor ebx, ebx
.Repeat
print str$(esi), 9
print hex$(edi), 32
print hex$(ebx), 13, 10
.if esi>31
shr ebx, 1
.elseif Zero?
mov ebx, 80000000h
dec edi
.else
shr edi, 1
.endif
inc esi
.Until esi>63
exit
end start
Output:
0 80000000 00000000
1 40000000 00000000
2 20000000 00000000
3 10000000 00000000
4 08000000 00000000
5 04000000 00000000
6 02000000 00000000
7 01000000 00000000
8 00800000 00000000
9 00400000 00000000
10 00200000 00000000
11 00100000 00000000
12 00080000 00000000
13 00040000 00000000
14 00020000 00000000
15 00010000 00000000
16 00008000 00000000
17 00004000 00000000
18 00002000 00000000
19 00001000 00000000
20 00000800 00000000
21 00000400 00000000
22 00000200 00000000
23 00000100 00000000
24 00000080 00000000
25 00000040 00000000
26 00000020 00000000
27 00000010 00000000
28 00000008 00000000
29 00000004 00000000
30 00000002 00000000
31 00000001 00000000
32 00000000 80000000
33 00000000 40000000
34 00000000 20000000
35 00000000 10000000
36 00000000 08000000
37 00000000 04000000
38 00000000 02000000
39 00000000 01000000
40 00000000 00800000
41 00000000 00400000
42 00000000 00200000
43 00000000 00100000
44 00000000 00080000
45 00000000 00040000
46 00000000 00020000
47 00000000 00010000
48 00000000 00008000
49 00000000 00004000
50 00000000 00002000
51 00000000 00001000
52 00000000 00000800
53 00000000 00000400
54 00000000 00000200
55 00000000 00000100
56 00000000 00000080
57 00000000 00000040
58 00000000 00000020
59 00000000 00000010
60 00000000 00000008
61 00000000 00000004
62 00000000 00000002
63 00000000 00000001
shrd edx,ebx,1
shr ebx,1
Quote from: jj2007 on January 06, 2015, 06:18:07 PM
For unknown reasons, Intel never bothered to give SIMD instructions a bitwise shift.
Actual they did that with SSE2.
SHRD will do the job, as sinsi shows
but, it brings up an interesting question - is there a SIMD way that's faster ? :biggrin:
in the old forum there was a "stir fry" bit swap that might give some ideas
another approach might be to isolate every other bit, then subtract a constant, causing a borrow
PSRLQ mm, imm8
Quote from: sinsi on January 06, 2015, 06:45:22 PM
PSRLQ mm, imm8
I knew I had missed something - thanks for freshing up my memory!
After some googling, I had found Missing instruction in SSE: PSLLDQ with _bit_ shift amount? (https://software.intel.com/en-us/forums/topic/303111), where Intel apologises for not having implemented bitwise shifts because it's difficult 8)
Will check now if shrd edx,ebx,1 plus shr ebx,1 is faster than pslrq. Thanks to everybody for the quick help :t