### Author Topic: Invert Matrix 3x3 for Real8 variables  (Read 2559 times)

#### guga

• Moderator
• Member
• Posts: 1451
• Assembly is a state of art.
##### Invert Matrix 3x3 for Real8 variables
« on: January 24, 2019, 07:15:43 AM »
This function computes the inverted 3x3 matrix formed by a Double FPU (Real8) data.

InvertMatrix_3x3_Double
Code: [Select]
`;;    InvertMatrix_3x3_Double        This function generates the inversion of a 3x3 squared matrix    Parameters:        InputMatrix: A pointer to a 3x3 matrix used as input. The size of the element on each member                      of the matrix must be a douuble (Real8) FPU data.        OutputMatrix: A pointer to a buffer that wil receive the generated 3x3 matrix.                      The size of the buffer must be 72 bytes (equivalent to 9 Real8 FPU variables), representing a 3x3 matrix formed by                      double FPU. So, 8*3*3 = 72.    Return Value:        If the function suceeds it returns TRUE and the OutputMatrix parameter will hold the converted data.        If it fails it returns FALSE meaning that the matrix can not be inverted. It happens when the computed determinant is 0.    Remarks:        A square matrix that is not invertible is called singular or degenerate.        A square matrix is singular if and only if its determinant is 0.    References: https://en.wikipedia.org/wiki/Invertible_matrix    Example of usage:    [TestMatrix1:     Test_M1: R\$ 0.8951   Test_M2: R\$ 0.2664   Test_M3: R\$ -0.1614     Test_M4: R\$ -0.7502  Test_M5: R\$ 1.7135   Test_M6: R\$ 0.0367     Test_M7: R\$ 0.0389   Test_M8: R\$ -0.0685  Test_M9: R\$ 1.0296]    [TestMatrix2: R\$ 0 #9]        call InvertMatrix_3x3_Double TestMatrix1, TestMatrix2;;Proc InvertMatrix_3x3_Double:    Arguments @InputMatrix, @OutputMatrix    Structure @TempStruct 264, @TmpDeterminantDis 0, @TmpOutMatrix1Dis 8, @TmpOutMatrix2Dis 136    Uses edi, esi, ecx, ebx    finit    mov esi D@InputMatrix    lea edi D@TmpOutMatrix1Dis;;[FloatMatrices.M1Dis (a)   FloatMatrices.M2Dis (b) FloatMatrices.M3Dis (c) FloatMatrices.M4Dis (d)   FloatMatrices.M5Dis (e) FloatMatrices.M6Dis (f) FloatMatrices.M7Dis (g)   FloatMatrices.M8Dis (h) FloatMatrices.M9Dis] (i);;    ; A = ei-fh    fld R\$esi+FloatMatrices.M5Dis | fmul R\$esi+FloatMatrices.M9Dis    fld R\$esi+FloatMatrices.M6Dis | fmul R\$esi+FloatMatrices.M8Dis | fchs | faddp ST1 ST0    fstp R\$edi+FloatMatrices.M1Dis    ; B = -(di-fg) = fg-di    fld R\$esi+FloatMatrices.M6Dis | fmul R\$esi+FloatMatrices.M7Dis    fld R\$esi+FloatMatrices.M4Dis | fmul R\$esi+FloatMatrices.M9Dis | fchs | faddp ST1 ST0    fstp R\$edi+FloatMatrices.M2Dis    ; C = dh-eg    fld R\$esi+FloatMatrices.M4Dis | fmul R\$esi+FloatMatrices.M8Dis    fld R\$esi+FloatMatrices.M5Dis | fmul R\$esi+FloatMatrices.M7Dis | fchs | faddp ST1 ST0    fstp R\$edi+FloatMatrices.M3Dis    ; D = -(bi-ch) = ch-bi    fld R\$esi+FloatMatrices.M3Dis | fmul R\$esi+FloatMatrices.M8Dis    fld R\$esi+FloatMatrices.M2Dis | fmul R\$esi+FloatMatrices.M9Dis | fchs | faddp ST1 ST0    fstp R\$edi+FloatMatrices.M4Dis    ; E =  ai-cg    fld R\$esi+FloatMatrices.M1Dis | fmul R\$esi+FloatMatrices.M9Dis    fld R\$esi+FloatMatrices.M3Dis | fmul R\$esi+FloatMatrices.M7Dis | fchs | faddp ST1 ST0    fstp R\$edi+FloatMatrices.M5Dis    ; F = -(ah-bg) = bg-ah    fld R\$esi+FloatMatrices.M2Dis | fmul R\$esi+FloatMatrices.M7Dis    fld R\$esi+FloatMatrices.M1Dis | fmul R\$esi+FloatMatrices.M8Dis | fchs | faddp ST1 ST0    fstp R\$edi+FloatMatrices.M6Dis    ; G = bf-ce    fld R\$esi+FloatMatrices.M2Dis | fmul R\$esi+FloatMatrices.M6Dis    fld R\$esi+FloatMatrices.M3Dis | fmul R\$esi+FloatMatrices.M5Dis | fchs | faddp ST1 ST0    fstp R\$edi+FloatMatrices.M7Dis    ; H =  -(af-cd) = cd-af    fld R\$esi+FloatMatrices.M3Dis | fmul R\$esi+FloatMatrices.M4Dis    fld R\$esi+FloatMatrices.M1Dis | fmul R\$esi+FloatMatrices.M6Dis | fchs | faddp ST1 ST0    fstp R\$edi+FloatMatrices.M8Dis    ; I = ae-bd    fld R\$esi+FloatMatrices.M1Dis | fmul R\$esi+FloatMatrices.M5Dis    fld R\$esi+FloatMatrices.M2Dis | fmul R\$esi+FloatMatrices.M4Dis | fchs | faddp ST1 ST0    fstp R\$edi+FloatMatrices.M9Dis    ; get determinant    lea eax D@TmpDeterminantDis    call GetDeterminantOfMatrix3x3_Double D@InputMatrix, eax    On eax = 0, ExitP ; Matrix not invertible    fld1 | fdiv R@TmpDeterminantDis | fstp R@TmpDeterminantDis    lea ebx D@TmpOutMatrix2Dis    call Fast_MatrixTranspose_Double edi, ebx, 3, 3    mov edi D@OutputMatrix    xor ecx ecx    Do        fld R\$ebx+ecx*8 | fmul R@TmpDeterminantDis | fstp R\$edi+ecx*8        inc ecx    Loop_Until ecx => 9    mov eax &TRUEEndP`

GetDeterminantOfMatrix3x3_Double
Code: [Select]
`__________________________________________________________________________________________________;;    GetDeterminantOfMatrix3x3_Double        This function retrieves the determinant of a 3x3 matrix    Parameters:            pMatrix - The pointer to a 3x3 squared matrix used as input from where we want to find itÂ´s determinant                  The size of each element of the matrix must be a Double FPU data (Real8)        pDeterminant - A pointer to a buffer to store the calculated determinant in Real8 data. The size of the buffer must be in                   double FPU (Real8).    Return Value:   If the function suceeds, it returns TRUE and also store the determinant value on the output buffer (pDeterminant).                    If the function fails, it returns FALSE meaning that the square matrix is singular and also if you are using the function                    to determine the inverse of a matrix, a FALSE result can also be used to confirm that the matrix can not be inverted.;;Proc GetDeterminantOfMatrix3x3_Double:    Arguments @pMatrix, @pDeterminant    Structure @DeterminantData 8, @DeterminantDataDis 0    Uses esi, edi    mov esi D@pMatrix    fld R\$esi+FloatMatrices.M1Dis    fld R\$esi+FloatMatrices.M5Dis | fld R\$esi+FloatMatrices.M9Dis | fmulp ST1 ST0    fld R\$esi+FloatMatrices.M6Dis | fld R\$esi+FloatMatrices.M8Dis | fmulp ST1 ST0    fchs    faddp ST1 ST0    fmulp ST1 ST0    fld R\$esi+FloatMatrices.M2Dis    fchs    fld R\$esi+FloatMatrices.M4Dis | fld R\$esi+FloatMatrices.M9Dis | fmulp ST1 ST0    fld R\$esi+FloatMatrices.M6Dis | fld R\$esi+FloatMatrices.M7Dis | fmulp ST1 ST0    fchs    faddp ST1 ST0    fmulp ST1 ST0    fld R\$esi+FloatMatrices.M3Dis    fld R\$esi+FloatMatrices.M4Dis | fld R\$esi+FloatMatrices.M8Dis | fmulp ST1 ST0    fld R\$esi+FloatMatrices.M5Dis | fld R\$esi+FloatMatrices.M7Dis | fmulp ST1 ST0    fchs    faddp ST1 ST0    fmulp ST1 ST0    faddp ST1 ST0    faddp ST1 ST0    fstp R@DeterminantDataDis    mov edi D@pDeterminant    Fpu_If R@DeterminantDataDis = R\$Float_Zero        fldz | fstp R\$edi ; make sure it is zero with any other rounding down value        xor eax eax    Fpu_Else        fld R@DeterminantDataDis | fstp R\$edi        mov eax &TRUE    Fpu_End_IfEndP`
Fast_MatrixTranspose_Double
Code: [Select]
`;;    Fast_MatrixTranspose_Double        This function transposes a marix with any size n*n            Parameters:                    Input - A pointer to a matrix of any size (width and height) to be transposed.                    The elements opf the matrix must be a Real8 (Double) each.            Output - A pointer to a buffer to receive the transposed data.                     The size of the buffer must be at least 128 bytes (4*4*8 = Width(4)*Height(4)*8(size of Real8FPU)).                     Also, the buffer must be aligned to 32 bytes (4 Real8) so it can compute the resultant transpose                     properly without affecting the data that are located after the end of the matrix.                     In other words, If your matrix have a size of 5*5, it means that you will need a buffer of                     200 Bytes (5*5*8) plus 24 extra bytes (3 Real8 Fpu data), since the internal computation of the transposition matrix                     work from 4*4 data each. So, if the matrix size is not a multiple of you will need more Real8 Fpu on the buffer to complete.                                Width - The width of the matrix (an integer)            Height - The height of the matrix (an integer)            Return Values: On exit the function will return the pointer to the start of the transposed matrix.                           In other words, it will return the same start address of the buffer you settled on Output parameter                           but will then contain the transposed matrix            Example of usage:            [Teste6x4:  R\$  1,  2,  3,  4,  5, 6,                        R\$ 26,  7,  8,  9, 10,27                        R\$ 11, 12, 13, 14, 15,28                        R\$ 16, 17, 18, 19, 20,29]            [MyMatrixBuffer: R\$ 0 #(6*4)]; No need for padding buffers, since it is a multiple of 4                call Fast_MatrixTranspose_Double Teste6x4, ebx, 6, 4            [Teste5x5:  R\$  1,  2,  3,  4,  5,                        R\$ 26,  7,  8,  9, 10,                        R\$ 11, 12, 13, 14, 15,                        R\$ 16, 17, 18, 19, 20,                        R\$ 46, 117, 4, 129, 23,]            [MyMatrixBuffer: R\$ 0 #(5*5)             PaddingBuffer: R\$ 0 #3] ; Extra 3 Real8 FPU to complete the buffer size                call Fast_MatrixTranspose_Double Teste6x4, ebx, 6, 4    Reference:        http://masm32.com/board/index.php?topic=6105.15;;Proc Fast_MatrixTranspose_Double:    Arguments @Input, @Output, @Width, @Height    Local @CurXPos, @RemainderY, @MaxYPos    Uses esi, edi, ebx, ecx, edx    mov esi D@Input    mov edi D@Output    ; get remainders for edi    mov D@RemainderY 0    mov edx D@Height | mov ecx edx | shr edx 2 | mov D@MaxYPos edx | and ecx 3; Check if value (Height) is a multiple of 4    jz L1>        ; found not multiple of 4        inc D@MaxYPos ; if height have a remainder (i mean, not multiple of 4, increment it)        mov eax 4 | sub eax ecx | shl eax 3        mov D@RemainderY eaxL1:    mov eax D@Width |  mov D@CurXPos eax | shl eax 3 | lea ebx D\$eax+eax*2 ; muylby 3; ebx = Width*4*3 . Width*8*3L2:    mov ecx D@MaxYPos    mov edx esi    Align 16 ; <---- Must be aligned to 16 to gain more speed and stability    L8:         ; copy the 1st 4 Qwords from esi to register XMM        movq XMM1 Q\$edx+eax        movq XMM0 Q\$edx        pslldq xmm1 8        xorps XMM0 XMM1        movupd X\$edi xmm0        movq XMM1 Q\$edx+ebx        pslldq xmm1 8        movq XMM0 Q\$edx+eax*2        xorps XMM0 XMM1        movupd X\$edi+16 xmm0        lea edx D\$edx+eax*4        add edi (8*4) ; advance one xmm reg        dec ecx | jg L8<    sub edi D@RemainderY; adjust edi from the remainder in YPos only    add esi 8    dec D@CurXPos | jnz L2<<    mov eax D@RemainderY    ; clear remainder bytes if any    test eax eax | jz L1>        shr eax 3 | jz L1>  | mov D\$edi 0 | mov D\$edi+4 0        dec eax | jz L1>    | mov D\$edi+8 0 | mov D\$edi+12 0        dec eax | jz L1>    | mov D\$edi+16 0 | mov D\$edi+20 0L1:    mov eax D@OutputEndP`
Extra Macros to compute the FPU comparitions

Code: [Select]
`[Fpu_Do | C0:][Fpu_Loop_Until | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 C0<<][.Fpu_Do | C1:][.Fpu_Loop_Until | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 C1<<][Fpu_While | B0: | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 B1>>][Fpu_End_While | jmp B0<< | B1:][.Fpu_While | B2: | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 B3>>][.Fpu_End_While | jmp B2<< | B3:][Fpu_If | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R0>>][Fpu_Else_If | jmp R5>> | R0: | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R0>>][Fpu_Else | jmp R5>> | R0:][Fpu_End_If | R0: | R5:][.Fpu_If |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R1>>][.Fpu_Else_If | jmp R6>> | R1: |  fld #3 | fld #1| fcompp | fstsw ax | fwait | sahf | jn#2 R1>>][.Fpu_Else | jmp R6>> | R1:][.Fpu_End_If | R1: | R6:][..Fpu_If |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R2>>][..Fpu_Else_If | jmp R7>> | R2: |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R2>>][..Fpu_Else | jmp R7>> | R2:][..Fpu_End_If | R2: | R7:][...Fpu_If |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R3>>][...Fpu_Else_If | jmp R8>> | R3: |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R3>>][...Fpu_Else | jmp R8>> | R3:][...Fpu_End_If | R3: | R8:][Fpu_If_And    | Fpu_If #1 #2 #3    | #+3][.Fpu_If_And   | .Fpu_If #1 #2 #3   | #+3][..Fpu_If_And  | ..Fpu_If #1 #2 #3  | #+3][...Fpu_If_And | ...Fpu_If #1 #2 #3 | #+3][Fpu_Else_If_And    | Fpu_Else    | Fpu_If_And    #F>L][.Fpu_Else_If_And   | .Fpu_Else   | .Fpu_If_And   #F>L][..Fpu_Else_If_And  | ..Fpu_Else  | ..Fpu_If_And  #F>L][...Fpu_Else_If_And | ...Fpu_Else | ...Fpu_If_And #F>L]`
Many tks to Rui, JJ and Siekmanski :t :t :t :t :t

Btw....If someone have a faster code that also can invert a squared matrix on any size, please post :) (And if you guys have, please post both versions, one using Real8 and other using Real4 Fpu ;) )
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com