Hi all, Here we have 2 SSE versions to compute the determinant of any square matrix MxM and 1 FPU version. We have also some versions SSE and FPU that use the Laplace method.Quote
VERSION 1,2:
PROCEDURE: Laplace3x3_v1SSE
Laplace3x3_v2SSE
Laplace4x4_v2SSE
FILE: Laplace3x3_v1SSE.inc
Laplace3x3_v2SSE.inc
Laplace4x4_v2SSE.inc
VERSION 3:
PROCEDURE: GaussMethod_v3SSE
FILE: GaussMethod_v3SSE.inc
MACROS: CopyMatrixXtoW_v3SSE.mac
CleanTriangularInfW_v3SSE.mac
TryTheBestPivot_v3SSE.mac
VERSION 2:
PROCEDURE: GaussMethod_v2SSE
FILE: GaussMethod_v2SSE.inc
MACROS: CopyMatrixXtoW_v2SSE.mac
CleanTriangularInfW_v2SSE.mac
TryTheBestPivot_v2SSE.mac
VERSION FPU:
PROCEDURES: GaussMethod_v1FPU
Laplace2x2_v1FPU
Laplace3x3_v1FPU
Laplace4x4_v1FPU
Laplace5x5_v1FPU
Laplace6x6_v1FPU
FILES: GaussMethod_v1FPU.inc
Laplace2x2_v1FPU.inc
Laplace3x3_v1FPU.inc
Laplace4x4_v1FPU.inc
Laplace5x5_v1FPU.inc
Laplace6x6_v1FPU.inc
DOCUMENTATION: TEXT_ABOUT_DETERMINANT_SSE_REAL4.txt
MATRIX DEFINITION: We must define any matrixX as this
ALIGN 16
dd ?
dd ?
dd M ; <<--- number of columns
dd M ; <<--- number of lines
matrixX dd (M*M) dup (?)
Please test it in your CPU (i5/i7/AMD).
Use ExecuteTestDeterminant_v2.bat and post the file ResultsTestDeterminant_v2.txt.
Good luck RuiLoureiroSome results:Quote
Siekmanski:
***** Time table - LoopCount =100 000 *****
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)
3 cycles, Laplace2x2_v1FPU, MatrixX2x2
9 cycles, GaussMethod_v3SSE, MatrixX2x2 <<<-- use Laplace
10 cycles, GaussMethod_v2SSE, MatrixX2x2
223 cycles, GaussMethod_v1FPU, MatrixX2x2
16 cycles, Laplace3x3_v1FPU, MatrixAX3x3
17 cycles, GaussMethod_v3SSE, MatrixAX3x3 <<<-- use Laplace
19 cycles, Laplace3x3_v1SSE, MatrixAX3x3
19 cycles, GaussMethod_v2SSE, MatrixAX3x3
316 cycles, GaussMethod_v1FPU, MatrixAX3x3
18 cycles, GaussMethod_v3SSE, MatrixX3x3 <<<-- use Laplace
19 cycles, GaussMethod_v2SSE, MatrixX3x3
19 cycles, Laplace3x3_v1FPU, MatrixX3x3
285 cycles, GaussMethod_v1FPU, MatrixX3x3
63 cycles, Laplace4x4_v2SSE, MatrixC4x4
69 cycles, Laplace4x4_v1FPU, MatrixC4x4
201 cycles, GaussMethod_v2SSE, MatrixC4x4
201 cycles, GaussMethod_v3SSE, MatrixC4x4
480 cycles, GaussMethod_v1FPU, MatrixC4x4
250 cycles, GaussMethod_v3SSE, MatrixX5x5
260 cycles, GaussMethod_v2SSE, MatrixX5x5
593 cycles, GaussMethod_v1FPU, MatrixX5x5
646 cycles, Laplace5x5_v1FPU, MatrixX5x5
385 cycles, GaussMethod_v2SSE, MatrixX6x6
576 cycles, GaussMethod_v3SSE, MatrixX6x6
833 cycles, GaussMethod_v1FPU, MatrixX6x6
4552 cycles, Laplace6x6_v1FPU, MatrixX6x6
495 cycles, GaussMethod_v3SSE, MatrixX7x7
518 cycles, GaussMethod_v2SSE, MatrixX7x7
1103 cycles, GaussMethod_v1FPU, MatrixX7x7
613 cycles, GaussMethod_v3SSE, MatrixA8x8
643 cycles, GaussMethod_v2SSE, MatrixA8x8
1465 cycles, GaussMethod_v1FPU, MatrixA8x8
789 cycles, GaussMethod_v3SSE, MatrixX9x9
845 cycles, GaussMethod_v2SSE, MatrixX9x9
1888 cycles, GaussMethod_v1FPU, MatrixX9x9
994 cycles, GaussMethod_v3SSE, MatrixX10x10
1066 cycles, GaussMethod_v2SSE, MatrixX10x10
2384 cycles, GaussMethod_v1FPU, MatrixX10x10
1236 cycles, GaussMethod_v3SSE, MatrixX11x11
1299 cycles, GaussMethod_v2SSE, MatrixX11x11
3051 cycles, GaussMethod_v1FPU, MatrixX11x11
Hi Rui,
Here are the results.
Hi Rui,
Hi Rui!
Something look strange: Laplace2x2_v1FPU have 11 instructions but are executed in 5 cycles? :dazzled:
Quote from: HSE on September 18, 2018, 09:49:03 PM
Hi Rui!
Something look strange: Laplace2x2_v1FPU have 11 instructions but are executed in 5 cycles? :dazzled:
And in i7 of Siekmanski
only 3 ! :icon_exclaim:
Obviously there is a problem in COUNTER_CYCLE 8)
Quote from: HSE on September 19, 2018, 08:22:10 AM
Obviously there is a problem in COUNTER_CYCLE 8)
Hi HSE,
Did you read the file
TestDeterminant_v2.asm ?
The file
Timing100000.inc is used here as it was
always used in all other cases before.
I will try to see where is the problem.
Quote from: RuiLoureiro on September 19, 2018, 08:41:37 AM
Hi HSE,
Did you read the file TestDeterminant_v2.asm ?
Yes Rui. I play a little with that.
Serialization is beyond my present knowledge but changing where serialization is forced by CPUID or RDTSCP result in different counts.
A difference of a few cycles is expected not to have a significant consequence when measuring more complex process. In an 11 instructions process is a big problem :biggrin:
Have fun :t