News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

How to pass an integer input parameter to shufps

Started by KradMoonRa, March 30, 2018, 04:03:07 AM

Previous topic - Next topic

KradMoonRa

Hi,

Getting messed with this one.

How can I pass an input parameter to the instruction shufps Imm?


_TEXT segment
align 16
uXm_xmm_shuffle_ps proto UX_VECCALL (xmmword) ;InXmm_A:xmmword, InXmm_B:xmmword, _Imm8:dword

align 16
uXm_xmm_shuffle_ps proc UX_VECCALL (xmmword) frame ;InXmm_A:xmmword, InXmm_B:xmmword, _Imm8:dword

local _Imm8:dword
mov _Imm8, eparam1 ;64bits-ecx/edi,32bits-ecx
shufps xmm0, xmm1, _Imm8 ;Invalid operation with shufps

ret
uXm_xmm_shuffle_ps endp
_TEXT ends
The uasmlib

Siekmanski

I has to be an immidiate 8 bit value (imm8)
You can not use locals, globals or registers.

Shuffle MACRO V0,V1,V2,V3
    EXITM %((V0 shl 6) or (V1 shl 4) or (V2 shl 2) or (V3))
ENDM

With a macro:
    shufps  xmm0,xmm0,Shuffle(1,3,2,0)
or direct:
    shufps  xmm0,xmm0,01111000b
Creative coders use backward thinking techniques as a strategy.

jj2007

Quote from: Siekmanski on March 30, 2018, 04:42:38 AM
I has to be an immidiate 8 bit value (imm8)
You can not use locals, globals or registers.

Nothing is impossible in assembler 8)

include \masm32\MasmBasic\MasmBasic.inc         ; download
.data
src OWORD 11223344556677889900AABBCCDDEEFFh

  Init
  movups xmm1, src
  mov ecx, 8C200h

  mov cl, 00011011b             ; the "immediate" parameter in a register ;-)

  push ecx
  mov eax, 0C1700f66h
  push eax
  call esp
  deb 4, "shuffled!!", x:xmm1, x:xmm0
EndOfCode


shuffled!
x:xmm1          11223344 55667788 9900AABB CCDDEEFF
x:xmm0          CCDDEEFF 9900AABB 55667788 11223344

daydreamer

I Think its impossible now with DEP to change an immediate value
you get a gpf if you try
check PSHUFB instead, it takes values in a xmm reg and can do the job instead
but its a SSSE3 instruction
PSHUFB xmm1, xmm2/m128 ;second operand Controls how its shuffled
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007


Siekmanski

QuoteNothing is impossible in assembler 8)

Hi JJ, can you write me a routine to increase my bank account?  :biggrin:

Another way is, create a piece of executable data section, and write the imm8 value right into the shufps mnemonic memory.
Creative coders use backward thinking techniques as a strategy.

jj2007

#6
Quote from: Siekmanski on March 30, 2018, 07:07:48 AMHi JJ, can you write me a routine to increase my bank account?  :biggrin:

Not my league, Marinus, sorry. But rumours say that at the end of the Cold War, many good programmers lost their jobs in the military-industrial complex, and found better paid ones at Wall Street :icon_cool:

QuoteAnother way is, create a piece of executable data section, and write the imm8 value right into the shufps mnemonic memory.

But that is exactly what my code does... only that I used pshufd instead of shufps:
  movups xmm1, src
  mov ecx, 8C200h
  mov cl, 00011011b ; the "immediate" parameter in a register ;-)
  push ecx
  push 0C1700f66h
  call esp


P.S.: The x64 equivalent - and that one stumbles indeed over DEP (but there is a working solution in the attachment):  mov rax, 0C1700F660008C200h
  mov al, 00011011b ; the "immediate" parameter in a register ;-)
  rol rax, 32
  push rax
  call rsp

Siekmanski

Hi Jochen,
OK, but I meant only 1 byte memory write. (SMC)  8)

Hi KradMoonRa,
Why the need for a "shufps" procedure with 3 inputs ( 2 xmm regs, 1 imm8 ).
Is there a reason you need to have a variable imm8 in memory?
Creative coders use backward thinking techniques as a strategy.

jj2007

Quote from: Siekmanski on March 30, 2018, 10:18:28 PMI meant only 1 byte memory write. (SMC)

Did you manage to do that in 64-bit land? My solution works, but it is admittedly a bit clumsy.

Siekmanski

I haven't tried it with masm 64 bit.
The only 64 bit experience I have is setting Masm64 up to use it with RadASM and run some of Hutch's examples...
Creative coders use backward thinking techniques as a strategy.

KradMoonRa

Hi @Siekmanski @jj2007 @daydreamer,

Awesome thank-you.

@Siekmanski
QuoteWhy the need for a "shufps" procedure with 3 inputs ( 2 xmm regs, 1 imm8 ).
Is there a reason you need to have a variable imm8 in memory?

I'm recreating the SSE functions in asm and export to cc language to make available with the intrinsic struct in the library.
After some hours trying to figure out how to pass the register value to an constant imm8 value, (and I'm not figured out that its a byte value), my best bet at the time has store it in 32bits memory.

With Your recommendations, I managed to do something like this.



;xmm4shuffle(1:3<<6,1:3<<4,1:3<<2,1:3)
xmm4shuffle0000 equ 0
xmm4shuffle0001 equ 1
xmm4shuffle0002 equ 2
xmm4shuffle0003 equ 3
xmm4shuffle0010 equ 4
xmm4shuffle0011 equ 5
........
........
........
xmm4shuffle3333 equ 255


uXm_xmm4shuffled_ps macro reg0, reg1, reg2
.switch reg2
.case xmm4shuffle0000
shufps reg0, reg1, xmm4shuffle0000
.break
.case xmm4shuffle0001
shufps reg0, reg1, xmm4shuffle0001
.break
.case xmm4shuffle0002
shufps reg0, reg1, xmm4shuffle0002
.break
.case xmm4shuffle0003
shufps reg0, reg1, xmm4shuffle0003
.break
.case xmm4shuffle0010
shufps reg0, reg1, xmm4shuffle0010
.break
.case xmm4shuffle0011
shufps reg0, reg1, xmm4shuffle0011
.break
..........
..........
..........
.case xmm4shuffle3333
shufps reg0, reg1, xmm4shuffle3333
.break
.endswitch
endm

_TEXT segment
align 16
uXm_xmm_shuffle_ps proto UX_VECCALL (xmmword) ;InXmm_A:xmmword, InXmm_B:xmmword, _Imm8:dword

align 16
uXm_xmm_shuffle_ps proc UX_VECCALL (xmmword) frame ;InXmm_A:xmmword, InXmm_B:xmmword, _Imm8:dword

uXm_xmm4shuffled_ps xmm0, xmm1, rparam3

ret
uXm_xmm_shuffle_ps endp
_TEXT ends


It's working, probably I can do better?

The library at my signature.
The uasmlib

KradMoonRa

Going to sync the library to github. in meantime it's available with the last changes.
The uasmlib

Siekmanski

I'm not familiar with your coding language but, maybe something like this is possible.

#define uXm_XMM_SHUFFLE(V0,V1,V2,V3) (((V0) << 6) | ((V1) << 4) | ((V2) << 2) | ((V3)))

uXm_xmm_shufps macro reg0, reg1, sp0, sp1, sp2, sp3

    shufps  reg0,reg1,uXm_XMM_SHUFFLE(sp0,sp1,sp2,sp3)

endm


Or you could try to use a jump table with all the 256 imm8 entries instead of the switch/case approach.
It will be a lot faster.
Creative coders use backward thinking techniques as a strategy.

habran

Hi KradMoonRa :biggrin:

It is not necessary to use .break in the switch block, UASM takes care of that 8)
you can just write like this:

.switch reg2
.case xmm4shuffle0000
shufps reg0, reg1, xmm4shuffle0000
.case xmm4shuffle0001
shufps reg0, reg1, xmm4shuffle0001
.case xmm4shuffle0002
shufps reg0, reg1, xmm4shuffle0002
.case xmm4shuffle0003
shufps reg0, reg1, xmm4shuffle0003
.case xmm4shuffle0010
shufps reg0, reg1, xmm4shuffle0010
.case xmm4shuffle0011
shufps reg0, reg1, xmm4shuffle0011
..........
..........
..........
.case xmm4shuffle3333
shufps reg0, reg1, xmm4shuffle3333
.endswitch
Cod-Father

KradMoonRa

Hi,

I came up to this after some hours of seeing the .object debug source, produces better results, but I'm thinking about the .for .endf solution, probably produces less and fast code, researching, the .for.


uXm_xmm4shuffled2_ps macro reg0, reg1, reg2

.if((bparam1 >= 240) && (bparam1 <= 255))
jmp xmm4jelabel_16x16
.endif

.if((bparam1 >= 224) && (bparam1 <= 239))
jmp xmm4jelabel_15x16
.endif
........
........
........
........
.if((bparam1 >= 0) && (bparam1 <= 15))
jmp xmm4jelabel_1x16
.endif

xmm4jelabel_1x16:
.if(reg2 == xmm4shuffle0000)
shufps reg0, reg1, xmm4shuffle0000
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0001)
shufps reg0, reg1, xmm4shuffle0001
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0002)
shufps reg0, reg1, xmm4shuffle0002
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0003)
shufps reg0, reg1, xmm4shuffle0003
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0010)
shufps reg0, reg1, xmm4shuffle0010
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0011)
shufps reg0, reg1, xmm4shuffle0011
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0012)
shufps reg0, reg1, xmm4shuffle0012
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0013)
shufps reg0, reg1, xmm4shuffle0013
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0020)
shufps reg0, reg1, xmm4shuffle0020
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0021)
shufps reg0, reg1, xmm4shuffle0021
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0022)
shufps reg0, reg1, xmm4shuffle0022
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0023)
shufps reg0, reg1, xmm4shuffle0023
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0030)
shufps reg0, reg1, xmm4shuffle0030
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0031)
shufps reg0, reg1, xmm4shuffle0031
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0032)
shufps reg0, reg1, xmm4shuffle0032
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle0033)
shufps reg0, reg1, xmm4shuffle0033
jmp xmm4shuffle_END
.endif
........
........
........
xmm4jelabel_16x16:
.if(reg2 == xmm4shuffle3300)
shufps reg0, reg1, xmm4shuffle3300
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3301)
shufps reg0, reg1, xmm4shuffle3301
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3302)
shufps reg0, reg1, xmm4shuffle3302
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3303)
shufps reg0, reg1, xmm4shuffle3303
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3310)
shufps reg0, reg1, xmm4shuffle3310
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3311)
shufps reg0, reg1, xmm4shuffle3311
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3312)
shufps reg0, reg1, xmm4shuffle3312
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3313)
shufps reg0, reg1, xmm4shuffle3313
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3320)
shufps reg0, reg1, xmm4shuffle3320
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3321)
shufps reg0, reg1, xmm4shuffle3321
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3322)
shufps reg0, reg1, xmm4shuffle3322
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3323)
shufps reg0, reg1, xmm4shuffle3323
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3330)
shufps reg0, reg1, xmm4shuffle3330
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3331)
shufps reg0, reg1, xmm4shuffle3331
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3332)
shufps reg0, reg1, xmm4shuffle3332
jmp xmm4shuffle_END
.endif
.if(reg2 == xmm4shuffle3333)
shufps reg0, reg1, xmm4shuffle3333
jmp xmm4shuffle_END
.endif
xmm4shuffle_END:
endm


_TEXT segment
align 16
uXm_xmm_shuffle_ps proto UX_VECCALL (xmmword) ;InXmm_A:xmmword, InXmm_B:xmmword, _Imm8:byte

align 16
uXm_xmm_shuffle_ps proc UX_VECCALL (xmmword) frame ;InXmm_A:xmmword, InXmm_B:xmmword, _Imm8:byte

uXm_xmm4shuffled2_ps  xmm0, xmm1, rparam3

ret
uXm_xmm_shuffle_ps endp
_TEXT ends

The uasmlib