Hi all, Here we have 2 SSE versions to invert any square matrix NxN and 3 FPU versions. All this work is inside the folder INVERSEMATRIX_v4. Inside we have all files used in the folder TESTCODE. I did only tests for 2x2,3x3,4x4,5x5,6x6,7x7,8x8,9x9,10x0,11x11. All seems to be correct so far.note1: When we read "... it is an identity matrix" it means that the product of that matrix with the inverse is the identity matrix: X(NxN) * X(NxN)^-1 = Identity matrixnote2: The output matrix - the inverse matrix - is Y(0x0) if X is not invertible.So after this we get "matrix definition error" if we try to use that Y matrix.Quote
VERSION 3:
PROCEDURE: invertmatrix_v3SSE
FILE: invertmatrix_v3SSE.inc
MACROS: CopyMatrixXtoW_v3SSE.mac
CleanTriangularInfSupW_v3SSE.mac
GetInverseMatrix_v3SSE.mac
VERSION 4:
PROCEDURE: invertmatrix_v4SSE
FILE: invertmatrix_v4SSE.inc
MACROS: CopyMatrixXtoW_v4SSE.mac
CleanTriangularInfSupW_v4SSE.mac
GetInverseMatrix_v4SSE.mac
TryTheBestPivot_v4SSE.mac
VERSIONS FPU 1,2,3:
PROCEDURE: invertmatrix_v1FPU
invertmatrix_v2FPU
invertmatrix_v3FPU
FILE: invertmatrix_v1FPU.inc
invertmatrix_v2FPU.inc
invertmatrix_v3FPU.inc
MACROS: invertmatrix_v2FPU.mac
invertmatrix_v3FPU.mac
DOCUMENTATION: TEXT_ABOUT_INVERSE_SSE_REAL4.txt
MATRIX DEFINITION: We must define any matrixX as this
ALIGN 16
dd ?
dd ?
dd M ; <<--- number of columns
dd M ; <<--- number of lines
matrixX dd (M*M) dup (?)
VERIFY SSE/FPU PROCEDURES:
Use invertmatrix_v1FPU.asm/exe, invertmatrix_v2FPU.asm/exe
invertmatrix_v3FPU.asm/exe, invertmatrix_v3SSE.asm/exe or
invertmatrix_v4SSE.asm/exe to see the test results.
Please test it in your CPU (i5/i7/AMD).
Use ExecuteTestInvertMatrix_v1.bat and post the file ResultsTestInvertMatrix_v1.txt.
Good luckRuiLoureiroQuote
Gaussian elimination method in 4 steps
-------------------------------------------------
Given a X(NxN) matrix:
STEP1: Alloc/define a matrix W(Nx2N), copy X to the left side of W and add
the identity matrix to the right side;
X = 1 0 4 => W = 1 0 4 ! 1 0 0
0 1 2 0 1 2 ! 0 1 0
2 0 4 2 0 4 ! 0 0 1
STEP2: Using linear operations(*), set to 1 all elements of the main diagonal (pivot elements)
and set to 0 all elements below that diagonal. We get a triangular superior matrix.
If we get 0 in any element of the main diagonal, the matrix is not invertible,
the determinant is null. Or, at least, it seems to be not invertible. If the
determinant of X is null it is not invertible. (* see documentation)
W = 1 0 4 ! 1 0 0 <-- multiply by -2 and add to line 2
0 1 2 ! 0 1 0
2 0 4 ! 0 0 1 <<-- add here
W = 1 0 4 ! 1 0 0 note: here, we see that adding line 2 we remove the 4 above 2
0 1 2 ! 0 1 0 but it is in the paper only... the procedure doesnt do this
0 0 -4 ! -2 0 1 <-- divide by -4 ==> determinant is -4 ( of X )
W = 1 0 4 ! 1 0 0
0 1 2 ! 0 1 0
0 0 1 ! 1/2 0 -1/4
STEP3: Set to 0 all elements above the main diagonal. In this way we get an identity matrix
at the left side of W. So the right side is the inverse matrix;
W = 1 0 4 ! 1 0 0 <<-- add here
0 1 2 ! 0 1 0
0 0 1 ! 1/2 0 -1/4 <-- multiply by -4
W = 1 0 0 ! -1 0 1
0 1 2 ! 0 1 0 <<-- add here
0 0 1 ! 1/2 0 -1/4 <-- multiply by -2
W = 1 0 0 ! -1 0 1
0 1 0 ! -1 1 1/2
0 0 1 ! 1/2 0 -1/4
STEP4: Copy the right side of W(Nx2N) to the output matrix Y(NxN). It is the inverse matrix of X.
Y(NxN) = X(NxN)^-1
X^-1 = -1 0 1
-1 1 1/2
1/2 0 -1/4
To test it: X(NxN)*Y(NxN) must be I, the identity matrix. If it is not an identity matrix
something is wrong and Y is not the inverse matrix.
X * X^-1 = [ 1 0 4 * [ -1 0 1 = [-1+2 0 1-1 = [ 1 0 0
0 1 2 -1 1 1/2 -1+1 1 1/2-1/2 0 1 0
2 0 4 ] 1/2 0 -1/4 ] -2+2 0 2-1 ] 0 0 1 ]
Some results: :t :t :t
Thanks Siekmanski, Enrique,LiaoMi
Siekmanski:
***** Time table - LoopCount =100 000 *****
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
130 cycles, invertmatrix_v4SSE, MatrixX2x2
162 cycles, invertmatrix_v3SSE, MatrixX2x2
302 cycles, invertmatrix_v3FPU, MatrixX2x2
308 cycles, invertmatrix_v2FPU, MatrixX2x2
368 cycles, invertmatrix_v1FPU, MatrixX2x2
211 cycles, invertmatrix_v4SSE, MatrixX3x3
256 cycles, invertmatrix_v3SSE, MatrixX3x3
445 cycles, invertmatrix_v2FPU, MatrixX3x3
453 cycles, invertmatrix_v3FPU, MatrixX3x3
589 cycles, invertmatrix_v1FPU, MatrixX3x3
250 cycles, invertmatrix_v4SSE, MatrixAX3x3
253 cycles, invertmatrix_v3SSE, MatrixAX3x3
458 cycles, invertmatrix_v2FPU, MatrixAX3x3
463 cycles, invertmatrix_v3FPU, MatrixAX3x3
530 cycles, invertmatrix_v1FPU, MatrixAX3x3
373 cycles, invertmatrix_v4SSE, MatrixC4x4
469 cycles, invertmatrix_v3SSE, MatrixC4x4
864 cycles, invertmatrix_v2FPU, MatrixC4x4
872 cycles, invertmatrix_v3FPU, MatrixC4x4
978 cycles, invertmatrix_v1FPU, MatrixC4x4
525 cycles, invertmatrix_v4SSE, MatrixX5x5
579 cycles, invertmatrix_v3SSE, MatrixX5x5
1373 cycles, invertmatrix_v2FPU, MatrixX5x5
1377 cycles, invertmatrix_v3FPU, MatrixX5x5
1417 cycles, invertmatrix_v1FPU, MatrixX5x5
742 cycles, invertmatrix_v4SSE, MatrixX6x6
1433 cycles, invertmatrix_v3SSE, MatrixX6x6
2107 cycles, invertmatrix_v3FPU, MatrixX6x6
2120 cycles, invertmatrix_v2FPU, MatrixX6x6
2209 cycles, invertmatrix_v1FPU, MatrixX6x6
1047 cycles, invertmatrix_v4SSE, MatrixX7x7
1175 cycles, invertmatrix_v3SSE, MatrixX7x7
3032 cycles, invertmatrix_v3FPU, MatrixX7x7
3250 cycles, invertmatrix_v1FPU, MatrixX7x7
4244 cycles, invertmatrix_v2FPU, MatrixX7x7
1376 cycles, invertmatrix_v4SSE, MatrixA8x8
1476 cycles, invertmatrix_v3SSE, MatrixA8x8
4253 cycles, invertmatrix_v3FPU, MatrixA8x8
4279 cycles, invertmatrix_v1FPU, MatrixA8x8
4341 cycles, invertmatrix_v2FPU, MatrixA8x8
1823 cycles, invertmatrix_v4SSE, MatrixX9x9
1958 cycles, invertmatrix_v3SSE, MatrixX9x9
5712 cycles, invertmatrix_v1FPU, MatrixX9x9
8440 cycles, invertmatrix_v3FPU, MatrixX9x9
8502 cycles, invertmatrix_v2FPU, MatrixX9x9
2267 cycles, invertmatrix_v4SSE, MatrixX10x10
2422 cycles, invertmatrix_v3SSE, MatrixX10x10
7507 cycles, invertmatrix_v1FPU, MatrixX10x10
7627 cycles, invertmatrix_v3FPU, MatrixX10x10
7663 cycles, invertmatrix_v2FPU, MatrixX10x10
2880 cycles, invertmatrix_v4SSE, MatrixX11x11
3612 cycles, invertmatrix_v3SSE, MatrixX11x11
9862 cycles, invertmatrix_v1FPU, MatrixX11x11
10013 cycles, invertmatrix_v3FPU, MatrixX11x11
10199 cycles, invertmatrix_v2FPU, MatrixX11x11
HSE:
***** Time table - LoopCount =100 000 *****
AMD A6-3500 APU with Radeon(tm) HD Graphics (SSE3)
176 cycles, invertmatrix_v4SSE, MatrixX2x2
211 cycles, invertmatrix_v3SSE, MatrixX2x2
444 cycles, invertmatrix_v3FPU, MatrixX2x2
471 cycles, invertmatrix_v2FPU, MatrixX2x2
553 cycles, invertmatrix_v1FPU, MatrixX2x2
290 cycles, invertmatrix_v4SSE, MatrixX3x3
327 cycles, invertmatrix_v3SSE, MatrixX3x3
650 cycles, invertmatrix_v2FPU, MatrixX3x3
677 cycles, invertmatrix_v3FPU, MatrixX3x3
789 cycles, invertmatrix_v1FPU, MatrixX3x3
293 cycles, invertmatrix_v4SSE, MatrixAX3x3
351 cycles, invertmatrix_v3SSE, MatrixAX3x3
772 cycles, invertmatrix_v2FPU, MatrixAX3x3
783 cycles, invertmatrix_v3FPU, MatrixAX3x3
869 cycles, invertmatrix_v1FPU, MatrixAX3x3
554 cycles, invertmatrix_v4SSE, MatrixC4x4
669 cycles, invertmatrix_v3SSE, MatrixC4x4
1528 cycles, invertmatrix_v2FPU, MatrixC4x4
1536 cycles, invertmatrix_v1FPU, MatrixC4x4
1621 cycles, invertmatrix_v3FPU, MatrixC4x4
992 cycles, invertmatrix_v4SSE, MatrixX5x5
1036 cycles, invertmatrix_v3SSE, MatrixX5x5
2562 cycles, invertmatrix_v2FPU, MatrixX5x5
2607 cycles, invertmatrix_v3FPU, MatrixX5x5
2623 cycles, invertmatrix_v1FPU, MatrixX5x5
1441 cycles, invertmatrix_v4SSE, MatrixX6x6
1568 cycles, invertmatrix_v3SSE, MatrixX6x6
3770 cycles, invertmatrix_v1FPU, MatrixX6x6
3928 cycles, invertmatrix_v3FPU, MatrixX6x6
3936 cycles, invertmatrix_v2FPU, MatrixX6x6
2067 cycles, invertmatrix_v4SSE, MatrixX7x7
2271 cycles, invertmatrix_v3SSE, MatrixX7x7
5457 cycles, invertmatrix_v1FPU, MatrixX7x7
5593 cycles, invertmatrix_v2FPU, MatrixX7x7
5673 cycles, invertmatrix_v3FPU, MatrixX7x7
2971 cycles, invertmatrix_v3SSE, MatrixA8x8
2979 cycles, invertmatrix_v4SSE, MatrixA8x8
7169 cycles, invertmatrix_v1FPU, MatrixA8x8
7517 cycles, invertmatrix_v2FPU, MatrixA8x8
7735 cycles, invertmatrix_v3FPU, MatrixA8x8
3825 cycles, invertmatrix_v4SSE, MatrixX9x9
4148 cycles, invertmatrix_v3SSE, MatrixX9x9
9413 cycles, invertmatrix_v1FPU, MatrixX9x9
10546 cycles, invertmatrix_v2FPU, MatrixX9x9
10879 cycles, invertmatrix_v3FPU, MatrixX9x9
4724 cycles, invertmatrix_v4SSE, MatrixX10x10
5054 cycles, invertmatrix_v3SSE, MatrixX10x10
11902 cycles, invertmatrix_v1FPU, MatrixX10x10
12450 cycles, invertmatrix_v2FPU, MatrixX10x10
12664 cycles, invertmatrix_v3FPU, MatrixX10x10
5896 cycles, invertmatrix_v4SSE, MatrixX11x11
6170 cycles, invertmatrix_v3SSE, MatrixX11x11
15358 cycles, invertmatrix_v1FPU, MatrixX11x11
15684 cycles, invertmatrix_v2FPU, MatrixX11x11
16379 cycles, invertmatrix_v3FPU, MatrixX11x11
LiaoMe:
***** Time table - LoopCount =100 000 *****
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz (SSE4)
103 cycles, invertmatrix_v4SSE, MatrixX2x2
134 cycles, invertmatrix_v3SSE, MatrixX2x2
228 cycles, invertmatrix_v3FPU, MatrixX2x2
240 cycles, invertmatrix_v2FPU, MatrixX2x2
284 cycles, invertmatrix_v1FPU, MatrixX2x2
145 cycles, invertmatrix_v4SSE, MatrixAX3x3
216 cycles, invertmatrix_v3SSE, MatrixAX3x3
348 cycles, invertmatrix_v2FPU, MatrixAX3x3
351 cycles, invertmatrix_v3FPU, MatrixAX3x3
395 cycles, invertmatrix_v1FPU, MatrixAX3x3
153 cycles, invertmatrix_v4SSE, MatrixX3x3
191 cycles, invertmatrix_v3SSE, MatrixX3x3
338 cycles, invertmatrix_v2FPU, MatrixX3x3
400 cycles, invertmatrix_v1FPU, MatrixX3x3
447 cycles, invertmatrix_v3FPU, MatrixX3x3
266 cycles, invertmatrix_v4SSE, MatrixC4x4
327 cycles, invertmatrix_v3SSE, MatrixC4x4
638 cycles, invertmatrix_v3FPU, MatrixC4x4
698 cycles, invertmatrix_v1FPU, MatrixC4x4
721 cycles, invertmatrix_v2FPU, MatrixC4x4
365 cycles, invertmatrix_v4SSE, MatrixX5x5
410 cycles, invertmatrix_v3SSE, MatrixX5x5
1013 cycles, invertmatrix_v3FPU, MatrixX5x5
1024 cycles, invertmatrix_v2FPU, MatrixX5x5
1131 cycles, invertmatrix_v1FPU, MatrixX5x5
583 cycles, invertmatrix_v3SSE, MatrixX6x6
825 cycles, invertmatrix_v4SSE, MatrixX6x6
1536 cycles, invertmatrix_v2FPU, MatrixX6x6
1571 cycles, invertmatrix_v1FPU, MatrixX6x6
1581 cycles, invertmatrix_v3FPU, MatrixX6x6
720 cycles, invertmatrix_v4SSE, MatrixX7x7
804 cycles, invertmatrix_v3SSE, MatrixX7x7
2182 cycles, invertmatrix_v3FPU, MatrixX7x7
2290 cycles, invertmatrix_v2FPU, MatrixX7x7
2447 cycles, invertmatrix_v1FPU, MatrixX7x7
888 cycles, invertmatrix_v4SSE, MatrixA8x8
990 cycles, invertmatrix_v3SSE, MatrixA8x8
3122 cycles, invertmatrix_v1FPU, MatrixA8x8
3253 cycles, invertmatrix_v3FPU, MatrixA8x8
3389 cycles, invertmatrix_v2FPU, MatrixA8x8
1217 cycles, invertmatrix_v4SSE, MatrixX9x9
1336 cycles, invertmatrix_v3SSE, MatrixX9x9
4272 cycles, invertmatrix_v1FPU, MatrixX9x9
6311 cycles, invertmatrix_v3FPU, MatrixX9x9
6584 cycles, invertmatrix_v2FPU, MatrixX9x9
1472 cycles, invertmatrix_v4SSE, MatrixX10x10
1614 cycles, invertmatrix_v3SSE, MatrixX10x10
5594 cycles, invertmatrix_v3FPU, MatrixX10x10
5610 cycles, invertmatrix_v1FPU, MatrixX10x10
6613 cycles, invertmatrix_v2FPU, MatrixX10x10
2027 cycles, invertmatrix_v4SSE, MatrixX11x11
2136 cycles, invertmatrix_v3SSE, MatrixX11x11
7452 cycles, invertmatrix_v3FPU, MatrixX11x11
7468 cycles, invertmatrix_v2FPU, MatrixX11x11
7558 cycles, invertmatrix_v1FPU, MatrixX11x11
Hi Rui,
Here are the results.
Hi Rui!
Hi Rui :P
Excelente Rui, Obrigado.
Essa versão é para Real8 tb ?
Abs
guga
Cool Bra (SA synonym for brother) tested good so far.