News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Move a number to an XMM register

Started by jj2007, February 11, 2018, 12:18:46 PM

Previous topic - Next topic

jj2007

Some special ways to move a number into an XMM register:
xorps xmm0, xmm0 ; set to zero, 3 bytes

pxor xmm0, xmm0 ; set to zero, 4 bytes

pcmpeqb xmm0, xmm0 ; set to -1 = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFFh, 4 bytes

m2m edx, 127
movd xmm0, edx ; set to 1 ... 127, 7 bytes

movd xmm0, MyDword ; set to any DWORD range number, 8+4=12 bytes

movups xmm0, MyOword ; 7+16=23 bytes

daydreamer

#1
The PUNPCKLBW, PUNPCKLWD, PUNPCKLDQ
I found out by using same reg on both operand you can unpack same dword/real4 or Word or byte values to a full XMM reg
movd xmm0,eax
PUNPCKLDQ xmm0,xmm0
sorry but it works only with mmx regs
maybe combined with other unpack it works for those who havent SSSE3 PSHUFB to fill XMM reg with one byte

wonder if I start a workerthread, if xmm regs are already zeroed or not?

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

How can you fill a "full" xmm reg? In my tests it fills either the lower or upper half... but pshufd does the full job:

include \masm32\MasmBasic\MasmBasic.inc         ; download
  Init
  mov eax, 12345678h
  movd xmm0,eax
  PUNPCKLDQ xmm0,xmm0
  deb 4, "PUNPCKLDQ", x:xmm0
  movd xmm0,eax
  PSHUFD xmm0,xmm0, 0
  deb 4, "PSHUFD   ", x:xmm0
EndOfCode


Result:
PUNPCKLDQ       x:xmm0          00000000 00000000 12345678 12345678
PSHUFD          x:xmm0          12345678 12345678 12345678 12345678


> wonder if I start a workerthread, if xmm regs are already zeroed or not?

It seems so, at least on Win7-64, but if it's not a documented feature, you better not rely on it 8)

daydreamer

you can fill xmm regs with LUT of most useful mathematical functions, and when you dont need precision you can take two neighbouring Points and (x1+x2)/2 to get approximation of a Point between them
its useful to change signs of several values into a negative curve
so your zero out gets more useful
xorps xmm0,xmm0
subps xmm0,xmm1 ;xmm1 contains positive constants,for example for a sine curve

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Quote from: daydreamer on February 13, 2018, 05:27:22 AM
you can fill xmm regs with LUT of most useful mathematical functions, and when you dont need precision you can take two neighbouring Points and (x1+x2)/2 to get approximation of a Point between them

Sinus() uses a variant of this method. About 7 times faster than the FPU's fsin, and equally precise, i.e. REAL10 precision.

daydreamer

Quote from: jj2007 on February 13, 2018, 05:37:48 AM
Quote from: daydreamer on February 13, 2018, 05:27:22 AM
you can fill xmm regs with LUT of most useful mathematical functions, and when you dont need precision you can take two neighbouring Points and (x1+x2)/2 to get approximation of a Point between them

Sinus() uses a variant of this method. About 7 times faster than the FPU's fsin, and equally precise, i.e. REAL10 precision.
Masmbasic looks nice,would be nice to try graphics interface,sprites?tile engine?

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding