News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Determinant of any real4 matrix NxN

Started by RuiLoureiro, September 18, 2018, 04:45:10 AM

Previous topic - Next topic

RuiLoureiro

Hi all,
       Here we have 2 SSE versions to compute the determinant of any square
       matrix MxM and 1 FPU version. We have also some versions SSE and FPU
       that use the Laplace method.

Quote
      VERSION 1,2:     
                PROCEDURE:  Laplace3x3_v1SSE
                                      Laplace3x3_v2SSE
                                      Laplace4x4_v2SSE                           
               
                FILE:       Laplace3x3_v1SSE.inc
                               Laplace3x3_v2SSE.inc
                               Laplace4x4_v2SSE.inc                           
      VERSION 3:     
                PROCEDURE:  GaussMethod_v3SSE
               
                FILE:             GaussMethod_v3SSE.inc

                MACROS:     CopyMatrixXtoW_v3SSE.mac
                                   CleanTriangularInfW_v3SSE.mac
                                   TryTheBestPivot_v3SSE.mac
               
      VERSION 2:     
                PROCEDURE:  GaussMethod_v2SSE
               
                FILE:             GaussMethod_v2SSE.inc

                MACROS:     CopyMatrixXtoW_v2SSE.mac
                                   CleanTriangularInfW_v2SSE.mac
                                   TryTheBestPivot_v2SSE.mac

      VERSION FPU:     
                PROCEDURES: GaussMethod_v1FPU

                                    Laplace2x2_v1FPU
                                    Laplace3x3_v1FPU
                                    Laplace4x4_v1FPU
                                    Laplace5x5_v1FPU
                                    Laplace6x6_v1FPU
               
                FILES:          GaussMethod_v1FPU.inc
               
                                    Laplace2x2_v1FPU.inc
                                    Laplace3x3_v1FPU.inc
                                    Laplace4x4_v1FPU.inc
                                    Laplace5x5_v1FPU.inc
                                    Laplace6x6_v1FPU.inc

    DOCUMENTATION:          TEXT_ABOUT_DETERMINANT_SSE_REAL4.txt

    MATRIX DEFINITION:      We must define any matrixX as this

                            ALIGN 16
                            dd ?
                            dd ?
                            dd M   ; <<--- number of columns
                            dd M   ; <<--- number of lines
              matrixX  dd (M*M) dup (?)         
   
    Please test it in your CPU (i5/i7/AMD).
    Use ExecuteTestDeterminant_v2.bat and post the file ResultsTestDeterminant_v2.txt.

Good luck
RuiLoureiro

Some results:
Quote
Siekmanski:
***** Time table - LoopCount =100 000 *****

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

   3  cycles, Laplace2x2_v1FPU,  MatrixX2x2
   9  cycles, GaussMethod_v3SSE, MatrixX2x2    <<<-- use Laplace
  10  cycles, GaussMethod_v2SSE, MatrixX2x2
223  cycles, GaussMethod_v1FPU, MatrixX2x2
 
  16  cycles, Laplace3x3_v1FPU,  MatrixAX3x3
  17  cycles, GaussMethod_v3SSE, MatrixAX3x3   <<<-- use Laplace
  19  cycles, Laplace3x3_v1SSE,  MatrixAX3x3
  19  cycles, GaussMethod_v2SSE, MatrixAX3x3
316  cycles, GaussMethod_v1FPU, MatrixAX3x3
 
  18  cycles, GaussMethod_v3SSE, MatrixX3x3  <<<-- use Laplace
  19  cycles, GaussMethod_v2SSE, MatrixX3x3
  19  cycles, Laplace3x3_v1FPU,  MatrixX3x3
285  cycles, GaussMethod_v1FPU, MatrixX3x3

  63  cycles, Laplace4x4_v2SSE,  MatrixC4x4
  69  cycles, Laplace4x4_v1FPU,  MatrixC4x4
201  cycles, GaussMethod_v2SSE, MatrixC4x4
201  cycles, GaussMethod_v3SSE, MatrixC4x4
480  cycles, GaussMethod_v1FPU, MatrixC4x4

250  cycles, GaussMethod_v3SSE, MatrixX5x5
260  cycles, GaussMethod_v2SSE, MatrixX5x5
593  cycles, GaussMethod_v1FPU, MatrixX5x5
646  cycles, Laplace5x5_v1FPU,  MatrixX5x5

385  cycles, GaussMethod_v2SSE, MatrixX6x6
576  cycles, GaussMethod_v3SSE, MatrixX6x6
833  cycles, GaussMethod_v1FPU, MatrixX6x6
4552  cycles, Laplace6x6_v1FPU,  MatrixX6x6

495  cycles, GaussMethod_v3SSE, MatrixX7x7
518  cycles, GaussMethod_v2SSE, MatrixX7x7
1103  cycles, GaussMethod_v1FPU, MatrixX7x7

613  cycles, GaussMethod_v3SSE, MatrixA8x8
643  cycles, GaussMethod_v2SSE, MatrixA8x8
1465  cycles, GaussMethod_v1FPU, MatrixA8x8

789  cycles, GaussMethod_v3SSE, MatrixX9x9
845  cycles, GaussMethod_v2SSE, MatrixX9x9
1888  cycles, GaussMethod_v1FPU, MatrixX9x9

994  cycles, GaussMethod_v3SSE, MatrixX10x10
1066  cycles, GaussMethod_v2SSE, MatrixX10x10
2384  cycles, GaussMethod_v1FPU, MatrixX10x10

1236  cycles, GaussMethod_v3SSE, MatrixX11x11
1299  cycles, GaussMethod_v2SSE, MatrixX11x11
3051  cycles, GaussMethod_v1FPU, MatrixX11x11

Siekmanski

Creative coders use backward thinking techniques as a strategy.

LiaoMi


HSE

Hi Rui!

Something look strange: Laplace2x2_v1FPU have 11 instructions but are executed in 5 cycles?  :dazzled:



Equations in Assembly: SmplMath

RuiLoureiro

Quote from: HSE on September 18, 2018, 09:49:03 PM
Hi Rui!

Something look strange: Laplace2x2_v1FPU have 11 instructions but are executed in 5 cycles?  :dazzled:
And in i7 of Siekmanski only 3 ! :icon_exclaim:

HSE

Obviously there is a problem in COUNTER_CYCLE  8)
Equations in Assembly: SmplMath

RuiLoureiro

Quote from: HSE on September 19, 2018, 08:22:10 AM
Obviously there is a problem in COUNTER_CYCLE  8)
Hi HSE,
          Did you read the file TestDeterminant_v2.asm ?

          The file Timing100000.inc is used here as it was
          always used in all other cases before.
          I will try to see where is the problem.

HSE

Quote from: RuiLoureiro on September 19, 2018, 08:41:37 AM
Hi HSE,
          Did you read the file TestDeterminant_v2.asm ?

Yes Rui. I play a little with that. 

Serialization is beyond my present knowledge but changing where  serialization  is forced by CPUID or RDTSCP result in different counts.

A difference of a few cycles is expected not to have a significant consequence when measuring more complex process. In an 11 instructions process is a big problem  :biggrin:

Have fun  :t

Equations in Assembly: SmplMath