News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Gather instructions

Started by InfiniteLoop, January 21, 2020, 05:21:31 AM

Previous topic - Next topic

InfiniteLoop

Hello all,
How are the gather instructions useful and how do they work exactly?

The Intel instruction reference is awful.  :undecided:
e.g.
VGATHERDPS ymm0, [rcx + ymm1 * 4 + 32], ymm2
rcx = address of some array of float values. 32 is an offset. I have no idea what ymm1 does or how *4 affects it. Testing it just seems to return every even element i.e. 0,2,4,6,8
ymm2 is a mask. I don't understand how masks are of any use, ever so that can be ignored and left as all ones.
Can I use this to extract elements say 1, 235, 245, 346, 564, 566, 700, 800 from some float array in the correct order?
If not then it seems somewhat useless.


LiaoMi

Hi,

        .model flat,c
        .code

; extern "C" void AvxGatherFloat_(YmmVal* des, YmmVal* indices, YmmVal* mask, const float* x);
;
; Description:  The following function demonstrates use of the
;               vgatherdps instruction.
;
; Requires:     AVX2

AvxGatherFloat_ proc
        push ebp
        mov ebp,esp
        push ebx

; Load argument values. The contents of des are loaded into ymm0
; prior to execution of the vgatherdps instruction in order to
; demonstrate the conditional effects of the control mask.
        mov eax,[ebp+8]                     ;eax = ptr to des
        mov ebx,[ebp+12]                    ;ebx = ptr to indices
        mov ecx,[ebp+16]                    ;ecx = ptr to mask
        mov edx,[ebp+20]                    ;edx = ptr to x
        vmovaps ymm0,ymmword ptr [eax]      ;ymm0 = des (initial values)
        vmovdqa ymm1,ymmword ptr [ebx]      ;ymm1 = indices
        vmovdqa ymm2,ymmword ptr [ecx]      ;ymm2 = mask

; Perform the gather operation and save the results.
        vgatherdps ymm0,[edx+ymm1*4],ymm2   ;ymm0 = gathered elements
        vmovaps ymmword ptr [eax],ymm0      ;save des
        vmovdqa ymmword ptr [ebx],ymm1      ;save indices (unchanged)
        vmovdqa ymmword ptr [ecx],ymm2      ;save mask (all zeros)

        vzeroupper
        pop ebx
        pop ebp
        ret
AvxGatherFloat_ endp


https://github.com/Apress/modern-x86-assembly-language-programming/tree/master/978-1-4842-0065-0_SourceCode/Chapter16/AvxGather
https://books.google.ch/books?id=plInCgAAQBAJ&pg=PA468&lpg=PA468&ots=-MKx-7Fo2x&sig=ACfU3U3sCgSWIuZD5WQgViLqnz2SpdP_Qw&hl=en

aw27

Quote from: InfiniteLoop on January 21, 2020, 05:21:31 AM
Hello all,
How are the gather instructions useful and how do they work exactly?

The Intel instruction reference is awful.  :undecided:
e.g.
VGATHERDPS ymm0, [rcx + ymm1 * 4 + 32], ymm2
rcx = address of some array of float values. 32 is an offset. I have no idea what ymm1 does or how *4 affects it. Testing it just seems to return every even element i.e. 0,2,4,6,8
ymm2 is a mask. I don't understand how masks are of any use, ever so that can be ignored and left as all ones.
Can I use this to extract elements say 1, 235, 245, 346, 564, 566, 700, 800 from some float array in the correct order?
If not then it seems somewhat useless.

Transposing a matrix
http://masm32.com/board/index.php?topic=7503.msg81974#msg81974

InfiniteLoop

I understand now thanks. It could be useful.

Matrix transpose is already fast enough without gather. Multiplication is where performance starts tanking.

aw27

Quote from: InfiniteLoop on January 21, 2020, 11:26:16 PM
Matrix transpose is already fast enough without gather. Multiplication is where performance starts tanking.
I agree.