The MASM Forum

General => The Campus => Topic started by: InfiniteLoop on January 21, 2020, 05:21:31 AM

Title: Gather instructions
Post by: InfiniteLoop on January 21, 2020, 05:21:31 AM
Hello all,
How are the gather instructions useful and how do they work exactly?

The Intel instruction reference is awful.  :undecided:
e.g.
VGATHERDPS ymm0, [rcx + ymm1 * 4 + 32], ymm2
rcx = address of some array of float values. 32 is an offset. I have no idea what ymm1 does or how *4 affects it. Testing it just seems to return every even element i.e. 0,2,4,6,8
ymm2 is a mask. I don't understand how masks are of any use, ever so that can be ignored and left as all ones.
Can I use this to extract elements say 1, 235, 245, 346, 564, 566, 700, 800 from some float array in the correct order?
If not then it seems somewhat useless.

Title: Re: Gather instructions
Post by: LiaoMi on January 21, 2020, 06:19:53 AM
Hi,

        .model flat,c
        .code

; extern "C" void AvxGatherFloat_(YmmVal* des, YmmVal* indices, YmmVal* mask, const float* x);
;
; Description:  The following function demonstrates use of the
;               vgatherdps instruction.
;
; Requires:     AVX2

AvxGatherFloat_ proc
        push ebp
        mov ebp,esp
        push ebx

; Load argument values. The contents of des are loaded into ymm0
; prior to execution of the vgatherdps instruction in order to
; demonstrate the conditional effects of the control mask.
        mov eax,[ebp+8]                     ;eax = ptr to des
        mov ebx,[ebp+12]                    ;ebx = ptr to indices
        mov ecx,[ebp+16]                    ;ecx = ptr to mask
        mov edx,[ebp+20]                    ;edx = ptr to x
        vmovaps ymm0,ymmword ptr [eax]      ;ymm0 = des (initial values)
        vmovdqa ymm1,ymmword ptr [ebx]      ;ymm1 = indices
        vmovdqa ymm2,ymmword ptr [ecx]      ;ymm2 = mask

; Perform the gather operation and save the results.
        vgatherdps ymm0,[edx+ymm1*4],ymm2   ;ymm0 = gathered elements
        vmovaps ymmword ptr [eax],ymm0      ;save des
        vmovdqa ymmword ptr [ebx],ymm1      ;save indices (unchanged)
        vmovdqa ymmword ptr [ecx],ymm2      ;save mask (all zeros)

        vzeroupper
        pop ebx
        pop ebp
        ret
AvxGatherFloat_ endp


https://github.com/Apress/modern-x86-assembly-language-programming/tree/master/978-1-4842-0065-0_SourceCode/Chapter16/AvxGather (https://github.com/Apress/modern-x86-assembly-language-programming/tree/master/978-1-4842-0065-0_SourceCode/Chapter16/AvxGather)
https://books.google.ch/books?id=plInCgAAQBAJ&pg=PA468&lpg=PA468&ots=-MKx-7Fo2x&sig=ACfU3U3sCgSWIuZD5WQgViLqnz2SpdP_Qw&hl=en (https://books.google.ch/books?id=plInCgAAQBAJ&pg=PA468&lpg=PA468&ots=-MKx-7Fo2x&sig=ACfU3U3sCgSWIuZD5WQgViLqnz2SpdP_Qw&hl=en)
Title: Re: Gather instructions
Post by: aw27 on January 21, 2020, 07:37:33 AM
Quote from: InfiniteLoop on January 21, 2020, 05:21:31 AM
Hello all,
How are the gather instructions useful and how do they work exactly?

The Intel instruction reference is awful.  :undecided:
e.g.
VGATHERDPS ymm0, [rcx + ymm1 * 4 + 32], ymm2
rcx = address of some array of float values. 32 is an offset. I have no idea what ymm1 does or how *4 affects it. Testing it just seems to return every even element i.e. 0,2,4,6,8
ymm2 is a mask. I don't understand how masks are of any use, ever so that can be ignored and left as all ones.
Can I use this to extract elements say 1, 235, 245, 346, 564, 566, 700, 800 from some float array in the correct order?
If not then it seems somewhat useless.

Transposing a matrix
http://masm32.com/board/index.php?topic=7503.msg81974#msg81974
Title: Re: Gather instructions
Post by: InfiniteLoop on January 21, 2020, 11:26:16 PM
I understand now thanks. It could be useful.

Matrix transpose is already fast enough without gather. Multiplication is where performance starts tanking.
Title: Re: Gather instructions
Post by: aw27 on January 22, 2020, 01:44:45 AM
Quote from: InfiniteLoop on January 21, 2020, 11:26:16 PM
Matrix transpose is already fast enough without gather. Multiplication is where performance starts tanking.
I agree.