Hello all,
How are the gather instructions useful and how do they work exactly?
The Intel instruction reference is awful. :undecided:
e.g.
VGATHERDPS ymm0, [rcx + ymm1 * 4 + 32], ymm2
rcx = address of some array of float values. 32 is an offset. I have no idea what ymm1 does or how *4 affects it. Testing it just seems to return every even element i.e. 0,2,4,6,8
ymm2 is a mask. I don't understand how masks are of any use, ever so that can be ignored and left as all ones.
Can I use this to extract elements say 1, 235, 245, 346, 564, 566, 700, 800 from some float array in the correct order?
If not then it seems somewhat useless.
Hi,
.model flat,c
.code
; extern "C" void AvxGatherFloat_(YmmVal* des, YmmVal* indices, YmmVal* mask, const float* x);
;
; Description: The following function demonstrates use of the
; vgatherdps instruction.
;
; Requires: AVX2
AvxGatherFloat_ proc
push ebp
mov ebp,esp
push ebx
; Load argument values. The contents of des are loaded into ymm0
; prior to execution of the vgatherdps instruction in order to
; demonstrate the conditional effects of the control mask.
mov eax,[ebp+8] ;eax = ptr to des
mov ebx,[ebp+12] ;ebx = ptr to indices
mov ecx,[ebp+16] ;ecx = ptr to mask
mov edx,[ebp+20] ;edx = ptr to x
vmovaps ymm0,ymmword ptr [eax] ;ymm0 = des (initial values)
vmovdqa ymm1,ymmword ptr [ebx] ;ymm1 = indices
vmovdqa ymm2,ymmword ptr [ecx] ;ymm2 = mask
; Perform the gather operation and save the results.
vgatherdps ymm0,[edx+ymm1*4],ymm2 ;ymm0 = gathered elements
vmovaps ymmword ptr [eax],ymm0 ;save des
vmovdqa ymmword ptr [ebx],ymm1 ;save indices (unchanged)
vmovdqa ymmword ptr [ecx],ymm2 ;save mask (all zeros)
vzeroupper
pop ebx
pop ebp
ret
AvxGatherFloat_ endp
https://github.com/Apress/modern-x86-assembly-language-programming/tree/master/978-1-4842-0065-0_SourceCode/Chapter16/AvxGather (https://github.com/Apress/modern-x86-assembly-language-programming/tree/master/978-1-4842-0065-0_SourceCode/Chapter16/AvxGather)
https://books.google.ch/books?id=plInCgAAQBAJ&pg=PA468&lpg=PA468&ots=-MKx-7Fo2x&sig=ACfU3U3sCgSWIuZD5WQgViLqnz2SpdP_Qw&hl=en (https://books.google.ch/books?id=plInCgAAQBAJ&pg=PA468&lpg=PA468&ots=-MKx-7Fo2x&sig=ACfU3U3sCgSWIuZD5WQgViLqnz2SpdP_Qw&hl=en)
Quote from: InfiniteLoop on January 21, 2020, 05:21:31 AM
Hello all,
How are the gather instructions useful and how do they work exactly?
The Intel instruction reference is awful. :undecided:
e.g.
VGATHERDPS ymm0, [rcx + ymm1 * 4 + 32], ymm2
rcx = address of some array of float values. 32 is an offset. I have no idea what ymm1 does or how *4 affects it. Testing it just seems to return every even element i.e. 0,2,4,6,8
ymm2 is a mask. I don't understand how masks are of any use, ever so that can be ignored and left as all ones.
Can I use this to extract elements say 1, 235, 245, 346, 564, 566, 700, 800 from some float array in the correct order?
If not then it seems somewhat useless.
Transposing a matrix
http://masm32.com/board/index.php?topic=7503.msg81974#msg81974
I understand now thanks. It could be useful.
Matrix transpose is already fast enough without gather. Multiplication is where performance starts tanking.
Quote from: InfiniteLoop on January 21, 2020, 11:26:16 PM
Matrix transpose is already fast enough without gather. Multiplication is where performance starts tanking.
I agree.