News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

8x8 Matrix Transpose using AVX

Started by aw27, July 09, 2018, 11:07:08 PM

Previous topic - Next topic

aw27

Quote from: zedd151 on July 14, 2018, 06:19:35 PM
Seriously though, how often would the average person need to transpose a matrix of any size?    :biggrin:
Not the average person but here most people are scientists  :biggrin:

zedd151

Quote from: AW on July 14, 2018, 06:37:02 PM
Not the average person but here most people are scientists  :biggrin:

Maybe I should have replaced 'average' in my post with 'normal'   :P

:badgrin:

Siekmanski

Quote from: jj2007 on July 14, 2018, 06:10:10 PM
Quote from: zedd151 on July 10, 2018, 08:44:14 PM
Damn, look what I started.   :greensml:

Clicking on "Show unread posts since last visit", there are FOUR matrix transpose threads active right now. And Marinus has just found a brand new VINSRTPS instruction. It's a mad place but I like it :icon_mrgreen:

:biggrin:

My excuse is, it's described in the section header.  :greensml:
Creative coders use backward thinking techniques as a strategy.

daydreamer

#18
Quote from: AW on July 10, 2018, 04:32:45 PM
Hi Zed  :biggrin: ,
I will have to think about a performance test then.  :t
now you calculator nerds,did anyone code a game on it?
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

aw27

Another approach, shorter and probably faster (untested). It requires AVX2 because uses the instruction VPGATHERDD (Gather Packed Dword Values Using Signed Dword).


; ************ TRANSPOSING CODE START ***********************
lea r8, invalues
mov ecx, 8
lea r9, outvalues
vmovdqa ymm0, YMMWORD PTR Index
@another:
vmovdqa ymm2, YMMWORD PTR _mask
VPGATHERDD ymm1, [r8+ymm0*1], ymm2
vmovdqa YMMWORD PTR [r9], ymm1
add r8, sizeof DWORD
add r9, 8*sizeof DWORD
loop @another
; ************ TRANSPOSING CODE END ************************



Transposing a 8x8 Matrix
Before:
row 0   1 2 3 4 5 6 7 8
row 1   9 10 11 12 13 14 15 16
row 2   17 18 19 20 21 22 23 24
row 3   25 26 27 28 29 30 31 32
row 4   33 34 35 36 37 38 39 40
row 5   41 42 43 44 45 46 47 48
row 6   49 50 51 52 53 54 55 56
row 7   57 58 59 60 61 62 63 64
After:
row 0   1 9 17 25 33 41 49 57
row 1   2 10 18 26 34 42 50 58
row 2   3 11 19 27 35 43 51 59
row 3   4 12 20 28 36 44 52 60
row 4   5 13 21 29 37 45 53 61
row 5   6 14 22 30 38 46 54 62
row 6   7 15 23 31 39 47 55 63
row 7   8 16 24 32 40 48 56 64