### Author Topic: Inverse matrix SSE real4 for any size  (Read 575 times)

#### RuiLoureiro

• Member
• Posts: 819
##### Inverse matrix SSE real4 for any size
« on: September 04, 2018, 07:07:33 AM »
Hi all,
Here we have 2 SSE versions to invert any square matrix NxN
and 3 FPU versions. All this work is inside the folder INVERSEMATRIX_v4.
Inside we have all files used in the folder TESTCODE.

I did only tests for 2x2,3x3,4x4,5x5,6x6,7x7,8x8,9x9,10x0,11x11.
All seems to be correct so far.

note1: When we read "... it is an identity matrix" it means that
the product of that matrix with the inverse is the identity matrix:

X(NxN) * X(NxN)^-1 = Identity matrix

note2: The output matrix - the inverse matrix - is Y(0x0) if X is not
invertible.So after this we get "matrix definition error" if we
try to use that Y matrix.

Quote
VERSION 3:
PROCEDURE:  invertmatrix_v3SSE

FILE:            invertmatrix_v3SSE.inc

MACROS:     CopyMatrixXtoW_v3SSE.mac
CleanTriangularInfSupW_v3SSE.mac
GetInverseMatrix_v3SSE.mac

VERSION 4:
PROCEDURE:  invertmatrix_v4SSE

FILE:              invertmatrix_v4SSE.inc

MACROS:      CopyMatrixXtoW_v4SSE.mac
CleanTriangularInfSupW_v4SSE.mac
GetInverseMatrix_v4SSE.mac
TryTheBestPivot_v4SSE.mac

VERSIONS FPU 1,2,3:

PROCEDURE:  invertmatrix_v1FPU
invertmatrix_v2FPU
invertmatrix_v3FPU

FILE:              invertmatrix_v1FPU.inc
invertmatrix_v2FPU.inc
invertmatrix_v3FPU.inc

MACROS:     invertmatrix_v2FPU.mac
invertmatrix_v3FPU.mac

MATRIX DEFINITION:      We must define any matrixX as this

ALIGN 16
dd ?
dd ?
dd M   ; <<--- number of columns
dd M   ; <<--- number of lines
matrixX  dd (M*M) dup (?)

VERIFY SSE/FPU PROCEDURES:

Use invertmatrix_v1FPU.asm/exe, invertmatrix_v2FPU.asm/exe
invertmatrix_v3FPU.asm/exe, invertmatrix_v3SSE.asm/exe or
invertmatrix_v4SSE.asm/exe to see the test results.

Use ExecuteTestInvertMatrix_v1.bat and post the file ResultsTestInvertMatrix_v1.txt.

Good luck
RuiLoureiro
Quote
Gaussian elimination method in 4 steps
-------------------------------------------------
Given a X(NxN) matrix:

STEP1: Alloc/define a matrix W(Nx2N), copy X to the left side of W and add
the identity matrix to the right side;

X =  1   0    4      =>  W = 1   0   4  !  1   0   0
0   1    2                    0   1   2  !  0   1   0
2   0    4                    2   0   4  !  0   0   1

STEP2: Using linear operations(*), set to 1 all elements of the main diagonal (pivot elements)
and set to 0 all elements below that diagonal. We get a triangular superior matrix.
If we get 0 in any element of the main diagonal, the matrix is not invertible,
the determinant is null. Or, at least, it seems to be not invertible. If the
determinant of X is null it is not invertible. (* see documentation)

W =  1   0   4  !  1   0    0   <-- multiply by -2 and add to line 2
0   1   2  !  0   1    0
2   0   4  !  0   0    1   <<-- add here

W =  1   0  !  1   0    0               note: here, we see that adding line 2 we remove the 4 above 2
0   1   2  !  0   1    0                         but it is in the paper only... the procedure doesnt do this
0   0  -4  ! -2   0    1   <-- divide by -4  ==> determinant is -4 ( of X )

W =  1   0   4  !  1   0    0
0   1   2  !  0   1    0
0   0   1  ! 1/2  0  -1/4

STEP3: Set to 0 all elements above the main diagonal. In this way we get an identity matrix
at the left side of W. So the right side is the inverse matrix;

W =  1   0  4  !  1   0    0   <<-- add here
0   1   2  !  0   1    0
0   0   1  ! 1/2  0  -1/4  <-- multiply by -4

W = 1   0   0  ! -1   0    1
0   1   2  !  0   1    0   <<-- add here
0   0   1  ! 1/2  0  -1/4  <-- multiply by -2

W = 1   0   0  ! -1   0    1
0   1   0  ! -1   1   1/2
0   0   1  ! 1/2  0  -1/4

STEP4: Copy the right side of W(Nx2N) to the output matrix Y(NxN). It is the inverse matrix of X.
Y(NxN) = X(NxN)^-1
X^-1 = -1    0     1
-1    1    1/2
1/2   0   -1/4

To test it: X(NxN)*Y(NxN) must be I, the identity matrix. If it is not an identity matrix
something is wrong and Y is not the inverse matrix.

X * X^-1 = [ 1    0    4  * [ -1   0    1     = [-1+2    0     1-1      = [ 1    0     0

0    1    2      -1   1   1/2       -1+1    1     1/2-1/2       0    1     0

2    0    4 ]    1/2  0  -1/4 ]     -2+2   0       2-1  ]        0    0    1 ]
Some results:
Thanks Siekmanski, Enrique,LiaoMi
Code: [Select]
` Siekmanski: ***** Time table - LoopCount =100 000 *****Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)   130  cycles, invertmatrix_v4SSE,  MatrixX2x2  162  cycles, invertmatrix_v3SSE,  MatrixX2x2  302  cycles, invertmatrix_v3FPU,  MatrixX2x2  308  cycles, invertmatrix_v2FPU,  MatrixX2x2  368  cycles, invertmatrix_v1FPU,  MatrixX2x2    211  cycles, invertmatrix_v4SSE,  MatrixX3x3  256  cycles, invertmatrix_v3SSE,  MatrixX3x3  445  cycles, invertmatrix_v2FPU,  MatrixX3x3  453  cycles, invertmatrix_v3FPU,  MatrixX3x3  589  cycles, invertmatrix_v1FPU,  MatrixX3x3  250  cycles, invertmatrix_v4SSE,  MatrixAX3x3  253  cycles, invertmatrix_v3SSE,  MatrixAX3x3  458  cycles, invertmatrix_v2FPU,  MatrixAX3x3  463  cycles, invertmatrix_v3FPU,  MatrixAX3x3  530  cycles, invertmatrix_v1FPU,  MatrixAX3x3   373  cycles, invertmatrix_v4SSE,  MatrixC4x4  469  cycles, invertmatrix_v3SSE,  MatrixC4x4  864  cycles, invertmatrix_v2FPU,  MatrixC4x4  872  cycles, invertmatrix_v3FPU,  MatrixC4x4  978  cycles, invertmatrix_v1FPU,  MatrixC4x4  525  cycles, invertmatrix_v4SSE,  MatrixX5x5  579  cycles, invertmatrix_v3SSE,  MatrixX5x5 1373  cycles, invertmatrix_v2FPU,  MatrixX5x5 1377  cycles, invertmatrix_v3FPU,  MatrixX5x5 1417  cycles, invertmatrix_v1FPU,  MatrixX5x5    742  cycles, invertmatrix_v4SSE,  MatrixX6x6 1433  cycles, invertmatrix_v3SSE,  MatrixX6x6 2107  cycles, invertmatrix_v3FPU,  MatrixX6x6 2120  cycles, invertmatrix_v2FPU,  MatrixX6x6 2209  cycles, invertmatrix_v1FPU,  MatrixX6x6   1047  cycles, invertmatrix_v4SSE,  MatrixX7x7 1175  cycles, invertmatrix_v3SSE,  MatrixX7x7 3032  cycles, invertmatrix_v3FPU,  MatrixX7x7 3250  cycles, invertmatrix_v1FPU,  MatrixX7x7 4244  cycles, invertmatrix_v2FPU,  MatrixX7x7  1376  cycles, invertmatrix_v4SSE,  MatrixA8x8 1476  cycles, invertmatrix_v3SSE,  MatrixA8x8 4253  cycles, invertmatrix_v3FPU,  MatrixA8x8 4279  cycles, invertmatrix_v1FPU,  MatrixA8x8 4341  cycles, invertmatrix_v2FPU,  MatrixA8x8  1823  cycles, invertmatrix_v4SSE,  MatrixX9x9 1958  cycles, invertmatrix_v3SSE,  MatrixX9x9 5712  cycles, invertmatrix_v1FPU,  MatrixX9x9 8440  cycles, invertmatrix_v3FPU,  MatrixX9x9 8502  cycles, invertmatrix_v2FPU,  MatrixX9x9  2267  cycles, invertmatrix_v4SSE,  MatrixX10x10 2422  cycles, invertmatrix_v3SSE,  MatrixX10x10 7507  cycles, invertmatrix_v1FPU,  MatrixX10x10 7627  cycles, invertmatrix_v3FPU,  MatrixX10x10 7663  cycles, invertmatrix_v2FPU,  MatrixX10x10  2880  cycles, invertmatrix_v4SSE,  MatrixX11x11 3612  cycles, invertmatrix_v3SSE,  MatrixX11x11 9862  cycles, invertmatrix_v1FPU,  MatrixX11x1110013  cycles, invertmatrix_v3FPU,  MatrixX11x1110199  cycles, invertmatrix_v2FPU,  MatrixX11x11HSE:  ***** Time table - LoopCount =100 000 *****AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)   176  cycles, invertmatrix_v4SSE,  MatrixX2x2  211  cycles, invertmatrix_v3SSE,  MatrixX2x2  444  cycles, invertmatrix_v3FPU,  MatrixX2x2  471  cycles, invertmatrix_v2FPU,  MatrixX2x2  553  cycles, invertmatrix_v1FPU,  MatrixX2x2    290  cycles, invertmatrix_v4SSE,  MatrixX3x3  327  cycles, invertmatrix_v3SSE,  MatrixX3x3  650  cycles, invertmatrix_v2FPU,  MatrixX3x3  677  cycles, invertmatrix_v3FPU,  MatrixX3x3  789  cycles, invertmatrix_v1FPU,  MatrixX3x3    293  cycles, invertmatrix_v4SSE,  MatrixAX3x3  351  cycles, invertmatrix_v3SSE,  MatrixAX3x3  772  cycles, invertmatrix_v2FPU,  MatrixAX3x3  783  cycles, invertmatrix_v3FPU,  MatrixAX3x3  869  cycles, invertmatrix_v1FPU,  MatrixAX3x3    554  cycles, invertmatrix_v4SSE,  MatrixC4x4  669  cycles, invertmatrix_v3SSE,  MatrixC4x4 1528  cycles, invertmatrix_v2FPU,  MatrixC4x4 1536  cycles, invertmatrix_v1FPU,  MatrixC4x4 1621  cycles, invertmatrix_v3FPU,  MatrixC4x4    992  cycles, invertmatrix_v4SSE,  MatrixX5x5 1036  cycles, invertmatrix_v3SSE,  MatrixX5x5 2562  cycles, invertmatrix_v2FPU,  MatrixX5x5 2607  cycles, invertmatrix_v3FPU,  MatrixX5x5 2623  cycles, invertmatrix_v1FPU,  MatrixX5x5  1441  cycles, invertmatrix_v4SSE,  MatrixX6x6 1568  cycles, invertmatrix_v3SSE,  MatrixX6x6 3770  cycles, invertmatrix_v1FPU,  MatrixX6x6 3928  cycles, invertmatrix_v3FPU,  MatrixX6x6 3936  cycles, invertmatrix_v2FPU,  MatrixX6x6  2067  cycles, invertmatrix_v4SSE,  MatrixX7x7 2271  cycles, invertmatrix_v3SSE,  MatrixX7x7 5457  cycles, invertmatrix_v1FPU,  MatrixX7x7 5593  cycles, invertmatrix_v2FPU,  MatrixX7x7 5673  cycles, invertmatrix_v3FPU,  MatrixX7x7  2971  cycles, invertmatrix_v3SSE,  MatrixA8x8 2979  cycles, invertmatrix_v4SSE,  MatrixA8x8 7169  cycles, invertmatrix_v1FPU,  MatrixA8x8 7517  cycles, invertmatrix_v2FPU,  MatrixA8x8 7735  cycles, invertmatrix_v3FPU,  MatrixA8x8  3825  cycles, invertmatrix_v4SSE,  MatrixX9x9 4148  cycles, invertmatrix_v3SSE,  MatrixX9x9 9413  cycles, invertmatrix_v1FPU,  MatrixX9x910546  cycles, invertmatrix_v2FPU,  MatrixX9x910879  cycles, invertmatrix_v3FPU,  MatrixX9x9  4724  cycles, invertmatrix_v4SSE,  MatrixX10x10 5054  cycles, invertmatrix_v3SSE,  MatrixX10x1011902  cycles, invertmatrix_v1FPU,  MatrixX10x1012450  cycles, invertmatrix_v2FPU,  MatrixX10x1012664  cycles, invertmatrix_v3FPU,  MatrixX10x10  5896  cycles, invertmatrix_v4SSE,  MatrixX11x11 6170  cycles, invertmatrix_v3SSE,  MatrixX11x1115358  cycles, invertmatrix_v1FPU,  MatrixX11x1115684  cycles, invertmatrix_v2FPU,  MatrixX11x1116379  cycles, invertmatrix_v3FPU,  MatrixX11x11LiaoMe:  ***** Time table - LoopCount =100 000 *****Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)  103  cycles, invertmatrix_v4SSE,  MatrixX2x2 134  cycles, invertmatrix_v3SSE,  MatrixX2x2 228  cycles, invertmatrix_v3FPU,  MatrixX2x2 240  cycles, invertmatrix_v2FPU,  MatrixX2x2 284  cycles, invertmatrix_v1FPU,  MatrixX2x2  145  cycles, invertmatrix_v4SSE,  MatrixAX3x3 216  cycles, invertmatrix_v3SSE,  MatrixAX3x3 348  cycles, invertmatrix_v2FPU,  MatrixAX3x3 351  cycles, invertmatrix_v3FPU,  MatrixAX3x3 395  cycles, invertmatrix_v1FPU,  MatrixAX3x3 153  cycles, invertmatrix_v4SSE,  MatrixX3x3 191  cycles, invertmatrix_v3SSE,  MatrixX3x3 338  cycles, invertmatrix_v2FPU,  MatrixX3x3 400  cycles, invertmatrix_v1FPU,  MatrixX3x3 447  cycles, invertmatrix_v3FPU,  MatrixX3x3  266  cycles, invertmatrix_v4SSE,  MatrixC4x4 327  cycles, invertmatrix_v3SSE,  MatrixC4x4 638  cycles, invertmatrix_v3FPU,  MatrixC4x4 698  cycles, invertmatrix_v1FPU,  MatrixC4x4 721  cycles, invertmatrix_v2FPU,  MatrixC4x4  365  cycles, invertmatrix_v4SSE,  MatrixX5x5 410  cycles, invertmatrix_v3SSE,  MatrixX5x51013  cycles, invertmatrix_v3FPU,  MatrixX5x51024  cycles, invertmatrix_v2FPU,  MatrixX5x51131  cycles, invertmatrix_v1FPU,  MatrixX5x5  583  cycles, invertmatrix_v3SSE,  MatrixX6x6 825  cycles, invertmatrix_v4SSE,  MatrixX6x61536  cycles, invertmatrix_v2FPU,  MatrixX6x61571  cycles, invertmatrix_v1FPU,  MatrixX6x61581  cycles, invertmatrix_v3FPU,  MatrixX6x6 720  cycles, invertmatrix_v4SSE,  MatrixX7x7 804  cycles, invertmatrix_v3SSE,  MatrixX7x72182  cycles, invertmatrix_v3FPU,  MatrixX7x72290  cycles, invertmatrix_v2FPU,  MatrixX7x72447  cycles, invertmatrix_v1FPU,  MatrixX7x7  888  cycles, invertmatrix_v4SSE,  MatrixA8x8 990  cycles, invertmatrix_v3SSE,  MatrixA8x83122  cycles, invertmatrix_v1FPU,  MatrixA8x83253  cycles, invertmatrix_v3FPU,  MatrixA8x83389  cycles, invertmatrix_v2FPU,  MatrixA8x8 1217  cycles, invertmatrix_v4SSE,  MatrixX9x91336  cycles, invertmatrix_v3SSE,  MatrixX9x94272  cycles, invertmatrix_v1FPU,  MatrixX9x96311  cycles, invertmatrix_v3FPU,  MatrixX9x96584  cycles, invertmatrix_v2FPU,  MatrixX9x91472  cycles, invertmatrix_v4SSE,  MatrixX10x101614  cycles, invertmatrix_v3SSE,  MatrixX10x105594  cycles, invertmatrix_v3FPU,  MatrixX10x105610  cycles, invertmatrix_v1FPU,  MatrixX10x106613  cycles, invertmatrix_v2FPU,  MatrixX10x102027  cycles, invertmatrix_v4SSE,  MatrixX11x112136  cycles, invertmatrix_v3SSE,  MatrixX11x117452  cycles, invertmatrix_v3FPU,  MatrixX11x117468  cycles, invertmatrix_v2FPU,  MatrixX11x117558  cycles, invertmatrix_v1FPU,  MatrixX11x11`
« Last Edit: September 09, 2018, 01:44:59 AM by RuiLoureiro »

#### Siekmanski

• Member
• Posts: 1726
##### Re: Inverse matrix SSE real4 for any size
« Reply #1 on: September 04, 2018, 11:26:09 PM »
Hi Rui,

Here are the results.
Creative coders use backward thinking techniques as a strategy.

#### HSE

• Member
• Posts: 890
• <AMD>< 7-32>
##### Re: Inverse matrix SSE real4 for any size
« Reply #2 on: September 05, 2018, 02:05:33 AM »
Hi Rui!

#### LiaoMi

• Member
• Posts: 333
##### Re: Inverse matrix SSE real4 for any size
« Reply #3 on: September 06, 2018, 08:11:47 AM »
Hi Rui

#### guga

• Member
• Posts: 857
• Assembly is a state of art.
##### Re: Inverse matrix SSE real4 for any size
« Reply #4 on: January 22, 2019, 06:21:06 AM »

Essa versão é para Real8 tb ?

Abs

guga
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### Raistlin

• Member
• Posts: 408
##### Re: Inverse matrix SSE real4 for any size
« Reply #5 on: January 22, 2019, 08:48:04 AM »
Cool Bra (SA synonym for brother)  tested good so far.
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...