The MASM Forum

General => The Campus => Topic started by: Rav on June 14, 2022, 04:14:23 AM

Title: Having trouble figuring out how to create a mask for an XMM register
Post by: Rav on June 14, 2022, 04:14:23 AM
I have an XMM register consisting of 16 bytes, each byte being the value FF, like so:  FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Let's say EAX contains a value from 1 to 15, which represents the number of high order bytes of the XMM register I want to zero out.  So for example:

If EAX was 1, the resulting XMM register would be:   00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
If EAX was 2, the resulting XMM register would be:   0000FFFFFFFFFFFFFFFFFFFFFFFFFFFF
...
...
If EAX was 15, the resulting XMM register would be: 000000000000000000000000000000FF

This has to be done at runtime, so I can't use immediate operands (I had thought of using PSLLDQ and PSRLDQ to shift, but those use imm8).

You may be guessing that what I'm trying to do is create a mask which I will subsequently PAND with another XMM register, and that is exactly what I am doing.  If I could zero out EAX high-order bytes of an XMM register directly without even having to use a mask, that would be even better.

I have been trying to figure out what instruction(s) will do this for me, and keep being stumped.  This must run in 32-bit mode; I am able to use SSE 4.2 instructions.

Thanks for any ideas.  / Rav
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: jj2007 on June 14, 2022, 05:23:36 AM
Self-modifying code would be an option. Thus, you could use the imm8.
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: daydreamer on June 14, 2022, 05:36:03 AM
self modifying SSE code is impossible with modern security settings
Rav check shufb byte shuffle instruction,beside emulating shift,there is also one way to zero bytes
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: jj2007 on June 14, 2022, 06:07:06 AM
Quote from: daydreamer on June 14, 2022, 05:36:03 AM
self modifying SSE code is impossible with modern security settings

Sure.

include \masm32\MasmBasic\MasmBasic.inc
MyO OWORD 11223344556677889900AABBCCDDEEFFh
  Init
  movups xmm1, MyO
  deb 4, "start", x:xmm1
  mov ecx, 5
  .Repeat
mov ch, 0C3h
push ecx
mov ch, 0
push 0F9730F66h
call esp ; pslldq xmm1, ecx
add esp, 8
deb 4, Str$("#%i", ecx), x:xmm1
dec ecx
  .Until Zero?
  MsgBox 0, "With a very old OS like Windows 7-64, that works just fine", "Hi", MB_OK
EndOfCode


Output:
start   x:xmm1          11223344 55667788 9900AABB CCDDEEFF
#5      x:xmm1          66778899 00AABBCC DDEEFF00 00000000
#4      x:xmm1          00AABBCC DDEEFF00 00000000 00000000
#3      x:xmm1          CCDDEEFF 00000000 00000000 00000000
#2      x:xmm1          EEFF0000 00000000 00000000 00000000
#1      x:xmm1          FF000000 00000000 00000000 00000000
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: Rav on June 14, 2022, 06:27:30 AM
Thanks, it was a great idea, but yeah, I got an access violation (I did verify I was overwriting the correct bytes).  Unless there's something I'm doing wrong there, I'll look into shufb (thanks daydreamer).  Here is the code I tried:

    ; On entry, BL contains the number of bytes to shift.

    call GetEIP             ; Get EIP (can't access it directly).
    jmp AfterGetEIP     ; Jump over GetEIP code.

    ; Return value of EIP in ESI:
    GetEIP:
    mov esi,[esp]
    ret

    AfterGetEIP:
    add esi,6+ShiftXMM-GetEIP   ; Offset to where the first imm8 value is.
    mov [esi],bl                          ; Overwrite first imm8.
    mov [esi+5],bl                      ; Overwrite second imm8.

    ShiftXMM:
    PSLLDQ xmm4,91H ; Placeholder imm8 will be overwritten at runtime by self-modifying code.
    PSRLDQ xmm4,92H ; Same.
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: jj2007 on June 14, 2022, 06:32:40 AM
Quote from: Rav on June 14, 2022, 06:27:30 AM
Thanks, it was a great idea, but yeah, I got an access violation

Well.... zero downloads of my code means you obviously haven't tested it. My code works like a charm, btw also on Windows 10, so I assume your version has a little problem. Happy bug chasing :tongue:
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: Rav on June 14, 2022, 07:04:05 AM
Quote from: jj2007 on June 14, 2022, 06:32:40 AM
Quote from: Rav on June 14, 2022, 06:27:30 AM
Thanks, it was a great idea, but yeah, I got an access violation

Well.... zero downloads of my code means you obviously haven't tested it. My code works like a charm, btw also on Windows 10, so I assume your version has a little problem. Happy bug chasing :tongue:

Sorry, I didn't see the download file name (it's displayed in a very small font).  I just downloaded it and tried it.  It does appear to work, but as a relative assembly newbie I don't understand the code.  I looked into using pshufb but it (apparently) would require a significant amount of set up before executing it, plus if I understand it correctly it would be doing excess work indexing (and yet not actually moving) the bytes I'm NOT zeroing.  I don't know if that description made sense.  At any rate, I'm going to try a different approach: table-driving the PAND mask.  There are only 15 possible masks that I need (00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, 0000FFFFFFFFFFFFFFFFFFFFFFFFFFFF ... 0000000000000000000000000000FFFF, and 000000000000000000000000000000FF), each 128 bits (16 bytes) long, so a table of 240 bytes doesn't concern me.  I'll just offset to one of the masks in memory and PAND with it.  At least that's the theory.  And it should be very fast.
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: jj2007 on June 14, 2022, 07:07:56 AM
Quote from: Rav on June 14, 2022, 07:04:05 AMI'm going to try a different approach: table-driving the PAND mask.  There are only 15 possible masks that I need (00FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, 0000FFFFFFFFFFFFFFFFFFFFFFFFFFFF ... 0000000000000000000000000000FFFF, and 000000000000000000000000000000FF), each 128 bits (16 bytes) long, so a table of 240 bytes doesn't concern me.  I'll just offset to one of the masks in memory and PAND with it.  At least that's the theory.  And it should be very fast.

Yep, that sounds like a good idea :thumbsup:

Quote from: Rav on June 14, 2022, 07:04:05 AMI don't understand the code

Let's have a look under the hood (the int 3 is to stop the debugger at this precise point):

  mov ecx, 5
  int 3
  .Repeat
mov ch, 0C3h
push ecx
mov ch, 0
push 0F9730F66h
call esp ; pslldq xmm1, 4
add esp, 8
dec ecx
  .Until Zero?


Address   Hex dump          Command
004011C2  |.  B9 05000000   mov ecx, 5
004011C7  |.  CC            int3
004011C8  |>  B5 C3         /mov ch, 0C3
004011CA  |.  51            |push ecx
004011CB  |.  B5 00         |mov ch, 0
004011CD  |.  68 660F73F9   |push F9730F66
004011D2  |.  FFD4          |call esp  <<<<<<<<<<<<<<<
004011D4  |.  83C4 08       |add esp, 8
004011D7  |.  49            |dec ecx
004011D8  |.^ 75 EE         \jnz short 004011C8


When calling esp, the cpu finds the following at esp:
0018FF84    660F73F9 04     pslldq xmm1, 4
0018FF89    C3              retn
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: Rav on June 14, 2022, 07:40:11 AM
jj2007, I THINK I understand, but I'm not sure.  You're actually pushing the instruction code (and the imm8 value) for pslldq onto the stack, then the call jumps to it and the CPU executes it there?  In other words, by calling esp, you're calling into the stack area rather than the code area?  Is that right?  And is that NOT an access violation because it's not modifying anything in the .code segment, but is modifying the stack, which is in the .data segment?  If that's right, is it perfectly permissible to execute code that resides in the .data segment rather than the .code segment?  I think I may still be confused.
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: jj2007 on June 14, 2022, 08:34:04 AM
You do understand - that's exactly what happens :tongue:
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: Rav on June 14, 2022, 09:20:25 AM
Quote from: jj2007 on June 14, 2022, 08:34:04 AM
You do understand - that's exactly what happens :tongue:

Thanks, and thanks for taking the time to explain.  / Rav
Title: Re: Having trouble figuring out how to create a mask for an XMM register
Post by: InfiniteLoop on June 18, 2022, 05:25:45 PM
Let rax = input mask 0 to 16, xmm0 = input vector

Attempt 1:
movdqu xmm1, xmmword ptr [MaskA + rax]
pand xmm0,xmm1
ret
MaskA: QWORD 0xFFFFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFFF,0,0


I can't think of a way to create a mask using less than 5 cycles.