The MASM Forum
General => The Campus => Topic started by: jj2007 on February 11, 2018, 12:18:46 PM

Some special ways to move a number into an XMM register:
xorps xmm0, xmm0 ; set to zero, 3 bytes
pxor xmm0, xmm0 ; set to zero, 4 bytes
pcmpeqb xmm0, xmm0 ; set to 1 = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFFh, 4 bytes
m2m edx, 127
movd xmm0, edx ; set to 1 ... 127, 7 bytes
movd xmm0, MyDword ; set to any DWORD range number, 8+4=12 bytes
movups xmm0, MyOword ; 7+16=23 bytes

The PUNPCKLBW, PUNPCKLWD, PUNPCKLDQ
I found out by using same reg on both operand you can unpack same dword/real4 or Word or byte values to a full XMM reg
movd xmm0,eax
PUNPCKLDQ xmm0,xmm0
sorry but it works only with mmx regs
maybe combined with other unpack it works for those who havent SSSE3 PSHUFB to fill XMM reg with one byte
wonder if I start a workerthread, if xmm regs are already zeroed or not?

How can you fill a "full" xmm reg? In my tests it fills either the lower or upper half... but pshufd does the full job:
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
Init
mov eax, 12345678h
movd xmm0,eax
PUNPCKLDQ xmm0,xmm0
deb 4, "PUNPCKLDQ", x:xmm0
movd xmm0,eax
PSHUFD xmm0,xmm0, 0
deb 4, "PSHUFD ", x:xmm0
EndOfCode
Result:
PUNPCKLDQ x:xmm0 00000000 00000000 12345678 12345678
PSHUFD x:xmm0 12345678 12345678 12345678 12345678
> wonder if I start a workerthread, if xmm regs are already zeroed or not?
It seems so, at least on Win764, but if it's not a documented feature, you better not rely on it 8)

you can fill xmm regs with LUT of most useful mathematical functions, and when you dont need precision you can take two neighbouring Points and (x1+x2)/2 to get approximation of a Point between them
its useful to change signs of several values into a negative curve
so your zero out gets more useful
xorps xmm0,xmm0
subps xmm0,xmm1 ;xmm1 contains positive constants,for example for a sine curve

you can fill xmm regs with LUT of most useful mathematical functions, and when you dont need precision you can take two neighbouring Points and (x1+x2)/2 to get approximation of a Point between them
Sinus() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1334) uses a variant of this method. About 7 times faster than the FPU's fsin, and equally precise, i.e. REAL10 precision.

you can fill xmm regs with LUT of most useful mathematical functions, and when you dont need precision you can take two neighbouring Points and (x1+x2)/2 to get approximation of a Point between them
Sinus() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1334) uses a variant of this method. About 7 times faster than the FPU's fsin, and equally precise, i.e. REAL10 precision.
Masmbasic looks nice,would be nice to try graphics interface,sprites?tile engine?