News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Chromatic Adaptation Function

Started by guga, January 24, 2019, 02:01:47 PM

Previous topic - Next topic

guga

Hi Guys, i created a function able to transform from one illuminant model to another. You can use this function to convert between D65 to F12, A Observer2º degree to C observer10º, using whatever method you want such as: Bradford, Bianco, Von Kries, Estevez, Sharp, CAT02 etc etc.

This is usefull when convertng a RGB pixel to CieLab, XYZ or CieLCH colorspaces and you want to change between the difference illuminance references on the image. Currently, the function supports 40 different illuminant models  with 18 different methods to be used.
So, you maybe able to transform/adapt 720 different ways a given colorspace matrix. For example, if you are converting RGB to CieLab using the D65 tristimulus matrix (D65 observer 2º):
[0.4124564, 0.3575761, 0.1804375
0.2126729, 0.7151522, 0.0721750
0.0193339, 0.1191920, 0.9503041]

It means that with GetChromaticAdaptationMatrix you can create 720 new matrixes derived from the D65 one ;)

The current function only generates the necessary matrix to perform those kind of convertions, and not the convertion of the tristimulus itself. To convert the tristimulus directly i´ll later  try to create another function.

The text i used is kind big for the post, so i attached the full file containing the complete function and data used.

Basically the function looks like this:



Proc GetChromaticAdaptationMatrix:
    Arguments @pOutMatrix, @WhiteRefFrom, @WhiteRefTo, @Method
    Local @pCurMatrixFrom
    Structure @Reference 264, @pRefXFromDis 0, @pRefYFromDis 8, @pRefZFromDis 16, @pRefXtoDis 24, @pRefYtoDis 32, @pRefZtoDis 40, @XScaledDis 48, @YScaledDis 56, @ZScaledDis 64,
                              @SigmaFromDis 72, @SigmaToDis 96, @TmpInvertedMatrixDis 120, @TmpScaledMatrixDis 192
    Uses ebx, ecx, edx, edi

    finit

    mov eax 0-2
    On D@Method > ADAPT_LIE, ExitP

    ; The very 1st thing to do is get the actual pointer to the inputed matrix and calculate its ínverse. If matrix is invertible we can continue

    mov ebx D@Method | imul ebx Size_Of_FloatMatrices | add ebx FloatXYZAdaptMatrices | mov D@pCurMatrixFrom ebx

    ; get the inverted matrix from where you want to compute from
    lea ebx D@TmpInvertedMatrixDis
    call InvertMatrix_3x3_Double D@pCurMatrixFrom, ebx
    If eax = 0 ; Squared Matrix is not inversible, exit
        mov eax 0-1 ; Put the proper error value on return
        ExitP
    End_If
    ; 2nd - Get the white references to we compute the fraction after dividing the RefFrom to RefTo

    lea edx D@pRefZFromDis
    lea ebx D@pRefYFromDis
    lea eax D@pRefXFromDis
    call FindWhiteRefEx eax, ebx, edx, D@WhiteRefFrom
    On eax = 0, ExitP ; the inputed illumination modle does not exists

    lea edx D@pRefZtoDis
    lea ebx D@pRefYtoDis
    lea eax D@pRefXtoDis
    call FindWhiteRefEx eax, ebx, edx, D@WhiteRefto
    On eax = 0, ExitP ; the inputed illumination modle does not exists


    ; 3rd = Multiply [Matrix3x3] [3x1]. So, multiply matrix calculated from input with the referece (From and To)
    ; [Matrix3x1] is simply the references of white displayed on a 3x1 vertex as [X, Y, Z] (vertical order: X top, Z bottom)
    mov ebx D@pCurMatrixFrom
    lea edi D@SigmaFromDis
    fld R$ebx+FloatMatrices.M1Dis | fmul R@pRefXFromDis | fld R$ebx+FloatMatrices.M2Dis | fmul R@pRefYFromDis | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M3Dis | fmul R@pRefZFromDis | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M1Dis
    fld R$ebx+FloatMatrices.M4Dis | fmul R@pRefXFromDis | fld R$ebx+FloatMatrices.M5Dis | fmul R@pRefYFromDis | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M6Dis | fmul R@pRefZFromDis | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M2Dis
    fld R$ebx+FloatMatrices.M7Dis | fmul R@pRefXFromDis | fld R$ebx+FloatMatrices.M8Dis | fmul R@pRefYFromDis | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M9Dis | fmul R@pRefZFromDis | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M3Dis

    ; do the same with Refto
    lea edi D@SigmaToDis
    fld R$ebx+FloatMatrices.M1Dis | fmul R@pRefXtoDis | fld R$ebx+FloatMatrices.M2Dis | fmul R@pRefYtoDis | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M3Dis | fmul R@pRefZtoDis | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M1Dis
    fld R$ebx+FloatMatrices.M4Dis | fmul R@pRefXtoDis | fld R$ebx+FloatMatrices.M5Dis | fmul R@pRefYtoDis | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M6Dis | fmul R@pRefZtoDis | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M2Dis
    fld R$ebx+FloatMatrices.M7Dis | fmul R@pRefXtoDis | fld R$ebx+FloatMatrices.M8Dis | fmul R@pRefYtoDis | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M9Dis | fmul R@pRefZtoDis | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M3Dis

    ; divide the generated results from XRefto with XRefFrom to we get a fraction. So, Scaled=Refto/RefFrom
    lea eax D@SigmaFromDis
    fld R$edi+FloatMatrices.M1Dis | fdiv R$eax+FloatMatrices.M1Dis | fstp R@XScaledDis
    fld R$edi+FloatMatrices.M2Dis | fdiv R$eax+FloatMatrices.M2Dis | fstp R@YScaledDis
    fld R$edi+FloatMatrices.M3Dis | fdiv R$eax+FloatMatrices.M3Dis | fstp R@ZScaledDis

    ; 4th put the generated Scaled Fractions on a matrix on the form of:
;     XScaled,     0     0
;         0    YScaled   0
;         0        0     ZScaled

    ; and multiply the inverted matrix achieved on input with the new 3x3 caled filled with zeroes like that.
;;
    lea ebx D@TmpInvertedMatrixDis
    lea edi D@TmpScaledMatrixDis
    fld R$ebx+FloatMatrices.M1Dis | fmul R@XScaledDis | fld R$ebx+FloatMatrices.M2Dis | fmul R$FloatZero | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M3Dis | fmul R$FloatZero | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M1Dis
    fld R$ebx+FloatMatrices.M1Dis | fmul R$FloatZero  | fld R$ebx+FloatMatrices.M2Dis | fmul R@YScaledDis | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M3Dis | fmul R$FloatZero | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M2Dis
    fld R$ebx+FloatMatrices.M1Dis | fmul R$FloatZero  | fld R$ebx+FloatMatrices.M2Dis | fmul R$FloatZero | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M3Dis | fmul R@ZScaledDis | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M3Dis

    fld R$ebx+FloatMatrices.M4Dis | fmul R@XScaledDis | fld R$ebx+FloatMatrices.M5Dis | fmul R$FloatZero | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M6Dis | fmul R$FloatZero | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M4Dis
    fld R$ebx+FloatMatrices.M4Dis | fmul R$FloatZero  | fld R$ebx+FloatMatrices.M5Dis | fmul R@YScaledDis | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M6Dis | fmul R$FloatZero | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M5Dis
    fld R$ebx+FloatMatrices.M4Dis | fmul R$FloatZero  | fld R$ebx+FloatMatrices.M5Dis | fmul R$FloatZero | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M6Dis | fmul R@ZScaledDis | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M6Dis

    fld R$ebx+FloatMatrices.M7Dis | fmul R@XScaledDis | fld R$ebx+FloatMatrices.M8Dis | fmul R$FloatZero | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M9Dis | fmul R$FloatZero | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M7Dis
    fld R$ebx+FloatMatrices.M7Dis | fmul R$FloatZero  | fld R$ebx+FloatMatrices.M8Dis | fmul R@YScaledDis | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M9Dis | fmul R$FloatZero | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M8Dis
    fld R$ebx+FloatMatrices.M7Dis | fmul R$FloatZero  | fld R$ebx+FloatMatrices.M8Dis | fmul R$FloatZero | faddp ST1 ST0 | fld R$ebx+FloatMatrices.M9Dis | fmul R@ZScaledDis | faddp ST1 ST0 | fstp R$edi+FloatMatrices.M9Dis
;;

    ; Since we are multiplying with zeroes we can make it faster, avoiding the zero multplications after all X*0 = 0. So the commented multplication above results on this 3x3 matrix:
    lea ebx D@TmpInvertedMatrixDis
    lea edi D@TmpScaledMatrixDis
    fld R$ebx+FloatMatrices.M1Dis | fmul R@XScaledDis | fstp R$edi+FloatMatrices.M1Dis
    fld R$ebx+FloatMatrices.M2Dis | fmul R@YScaledDis | fstp R$edi+FloatMatrices.M2Dis
    fld R$ebx+FloatMatrices.M3Dis | fmul R@ZScaledDis | fstp R$edi+FloatMatrices.M3Dis

    fld R$ebx+FloatMatrices.M4Dis | fmul R@XScaledDis | fstp R$edi+FloatMatrices.M4Dis
    fld R$ebx+FloatMatrices.M5Dis | fmul R@YScaledDis | fstp R$edi+FloatMatrices.M5Dis
    fld R$ebx+FloatMatrices.M6Dis | fmul R@ZScaledDis | fstp R$edi+FloatMatrices.M6Dis

    fld R$ebx+FloatMatrices.M7Dis | fmul R@XScaledDis | fstp R$edi+FloatMatrices.M7Dis
    fld R$ebx+FloatMatrices.M8Dis | fmul R@YScaledDis | fstp R$edi+FloatMatrices.M8Dis
    fld R$ebx+FloatMatrices.M9Dis | fmul R@ZScaledDis | fstp R$edi+FloatMatrices.M9Dis

    ; 4th. Finally we can simply multiply the resultant inverted 3x3 matrix above with the one achieved from input

    mov ebx D@pCurMatrixFrom
    mov edx D@pOutMatrix
    fld R$edi+FloatMatrices.M1Dis | fmul R$ebx+FloatMatrices.M1Dis | fld R$edi+FloatMatrices.M2Dis | fmul R$ebx+FloatMatrices.M4Dis | faddp ST1 ST0 | fld R$edi+FloatMatrices.M3Dis | fmul R$ebx+FloatMatrices.M7Dis | faddp ST1 ST0 | fstp R$edx+FloatMatrices.M1Dis
    fld R$edi+FloatMatrices.M1Dis | fmul R$ebx+FloatMatrices.M2Dis | fld R$edi+FloatMatrices.M2Dis | fmul R$ebx+FloatMatrices.M5Dis | faddp ST1 ST0 | fld R$edi+FloatMatrices.M3Dis | fmul R$ebx+FloatMatrices.M8Dis | faddp ST1 ST0 | fstp R$edx+FloatMatrices.M2Dis
    fld R$edi+FloatMatrices.M1Dis | fmul R$ebx+FloatMatrices.M3Dis | fld R$edi+FloatMatrices.M2Dis | fmul R$ebx+FloatMatrices.M6Dis | faddp ST1 ST0 | fld R$edi+FloatMatrices.M3Dis | fmul R$ebx+FloatMatrices.M9Dis | faddp ST1 ST0 | fstp R$edx+FloatMatrices.M3Dis

    fld R$edi+FloatMatrices.M4Dis | fmul R$ebx+FloatMatrices.M1Dis | fld R$edi+FloatMatrices.M5Dis | fmul R$ebx+FloatMatrices.M4Dis | faddp ST1 ST0 | fld R$edi+FloatMatrices.M6Dis | fmul R$ebx+FloatMatrices.M7Dis | faddp ST1 ST0 | fstp R$edx+FloatMatrices.M4Dis
    fld R$edi+FloatMatrices.M4Dis | fmul R$ebx+FloatMatrices.M2Dis | fld R$edi+FloatMatrices.M5Dis | fmul R$ebx+FloatMatrices.M5Dis | faddp ST1 ST0 | fld R$edi+FloatMatrices.M6Dis | fmul R$ebx+FloatMatrices.M8Dis | faddp ST1 ST0 | fstp R$edx+FloatMatrices.M5Dis
    fld R$edi+FloatMatrices.M4Dis | fmul R$ebx+FloatMatrices.M3Dis | fld R$edi+FloatMatrices.M5Dis | fmul R$ebx+FloatMatrices.M6Dis | faddp ST1 ST0 | fld R$edi+FloatMatrices.M6Dis | fmul R$ebx+FloatMatrices.M9Dis | faddp ST1 ST0 | fstp R$edx+FloatMatrices.M6Dis

    fld R$edi+FloatMatrices.M7Dis | fmul R$ebx+FloatMatrices.M1Dis | fld R$edi+FloatMatrices.M8Dis | fmul R$ebx+FloatMatrices.M4Dis | faddp ST1 ST0 | fld R$edi+FloatMatrices.M9Dis | fmul R$ebx+FloatMatrices.M7Dis | faddp ST1 ST0 | fstp R$edx+FloatMatrices.M7Dis
    fld R$edi+FloatMatrices.M7Dis | fmul R$ebx+FloatMatrices.M2Dis | fld R$edi+FloatMatrices.M8Dis | fmul R$ebx+FloatMatrices.M5Dis | faddp ST1 ST0 | fld R$edi+FloatMatrices.M9Dis | fmul R$ebx+FloatMatrices.M8Dis | faddp ST1 ST0 | fstp R$edx+FloatMatrices.M8Dis
    fld R$edi+FloatMatrices.M7Dis | fmul R$ebx+FloatMatrices.M3Dis | fld R$edi+FloatMatrices.M8Dis | fmul R$ebx+FloatMatrices.M6Dis | faddp ST1 ST0 | fld R$edi+FloatMatrices.M9Dis | fmul R$ebx+FloatMatrices.M9Dis | faddp ST1 ST0 | fstp R$edx+FloatMatrices.M9Dis

    mov eax &TRUE

EndP
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Additional functions used:

InvertMatrix_3x3_Double


;;
    InvertMatrix_3x3_Double
   
    This function generates the inversion of a 3x3 squared matrix

    Parameters:
        InputMatrix: A pointer to a 3x3 matrix used as input. The size of the element on each member
                     of the matrix must be a douuble (Real8) FPU data.

        OutputMatrix: A pointer to a buffer that wil receive the generated 3x3 matrix.
                      The size of the buffer must be 72 bytes (equivalent to 9 Real8 FPU variables), representing a 3x3 matrix formed by
                      double FPU. So, 8*3*3 = 72.

    Return Value:
        If the function suceeds it returns TRUE and the OutputMatrix parameter will hold the converted data.
        If it fails it returns FALSE meaning that the matrix can not be inverted. It happens when the computed determinant is 0.

    Remarks:
        A square matrix that is not invertible is called singular or degenerate.
        A square matrix is singular if and only if its determinant is 0.


    References: https://en.wikipedia.org/wiki/Invertible_matrix

    Example of usage:

    [TestMatrix1:
     Test_M1: R$ 0.8951   Test_M2: R$ 0.2664   Test_M3: R$ -0.1614
     Test_M4: R$ -0.7502  Test_M5: R$ 1.7135   Test_M6: R$ 0.0367
     Test_M7: R$ 0.0389   Test_M8: R$ -0.0685  Test_M9: R$ 1.0296]

    [TestMatrix2: R$ 0 #9]

        call InvertMatrix_3x3_Double TestMatrix1, TestMatrix2

;;


Proc InvertMatrix_3x3_Double:
    Arguments @InputMatrix, @OutputMatrix
    Structure @TempStruct 264, @TmpDeterminantDis 0, @TmpOutMatrix1Dis 8, @TmpOutMatrix2Dis 136
    Uses edi, esi, ecx, ebx


    finit

    mov esi D@InputMatrix
    lea edi D@TmpOutMatrix1Dis

;;
[FloatMatrices.M1Dis (a)   FloatMatrices.M2Dis (b) FloatMatrices.M3Dis (c)
FloatMatrices.M4Dis (d)   FloatMatrices.M5Dis (e) FloatMatrices.M6Dis (f)
FloatMatrices.M7Dis (g)   FloatMatrices.M8Dis (h) FloatMatrices.M9Dis] (i)

;;

    ; A = ei-fh
    fld R$esi+FloatMatrices.M5Dis | fmul R$esi+FloatMatrices.M9Dis
    fld R$esi+FloatMatrices.M6Dis | fmul R$esi+FloatMatrices.M8Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M1Dis

    ; B = -(di-fg) = fg-di
    fld R$esi+FloatMatrices.M6Dis | fmul R$esi+FloatMatrices.M7Dis
    fld R$esi+FloatMatrices.M4Dis | fmul R$esi+FloatMatrices.M9Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M2Dis

    ; C = dh-eg
    fld R$esi+FloatMatrices.M4Dis | fmul R$esi+FloatMatrices.M8Dis
    fld R$esi+FloatMatrices.M5Dis | fmul R$esi+FloatMatrices.M7Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M3Dis

    ; D = -(bi-ch) = ch-bi
    fld R$esi+FloatMatrices.M3Dis | fmul R$esi+FloatMatrices.M8Dis
    fld R$esi+FloatMatrices.M2Dis | fmul R$esi+FloatMatrices.M9Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M4Dis

    ; E =  ai-cg
    fld R$esi+FloatMatrices.M1Dis | fmul R$esi+FloatMatrices.M9Dis
    fld R$esi+FloatMatrices.M3Dis | fmul R$esi+FloatMatrices.M7Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M5Dis

    ; F = -(ah-bg) = bg-ah
    fld R$esi+FloatMatrices.M2Dis | fmul R$esi+FloatMatrices.M7Dis
    fld R$esi+FloatMatrices.M1Dis | fmul R$esi+FloatMatrices.M8Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M6Dis

    ; G = bf-ce
    fld R$esi+FloatMatrices.M2Dis | fmul R$esi+FloatMatrices.M6Dis
    fld R$esi+FloatMatrices.M3Dis | fmul R$esi+FloatMatrices.M5Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M7Dis

    ; H =  -(af-cd) = cd-af
    fld R$esi+FloatMatrices.M3Dis | fmul R$esi+FloatMatrices.M4Dis
    fld R$esi+FloatMatrices.M1Dis | fmul R$esi+FloatMatrices.M6Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M8Dis

    ; I = ae-bd
    fld R$esi+FloatMatrices.M1Dis | fmul R$esi+FloatMatrices.M5Dis
    fld R$esi+FloatMatrices.M2Dis | fmul R$esi+FloatMatrices.M4Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M9Dis

    ; get determinant
    lea eax D@TmpDeterminantDis
    call GetDeterminantOfMatrix3x3_Double D@InputMatrix, eax
    On eax = 0, ExitP ; Matrix not invertible
    fld1 | fdiv R@TmpDeterminantDis | fstp R@TmpDeterminantDis

    lea ebx D@TmpOutMatrix2Dis
    call Fast_MatrixTranspose_Double edi, ebx, 3, 3

    mov edi D@OutputMatrix
    xor ecx ecx
    Do
        fld R$ebx+ecx*8 | fmul R@TmpDeterminantDis | fstp R$edi+ecx*8
        inc ecx
    Loop_Until ecx => 9

    mov eax &TRUE

EndP


GetDeterminantOfMatrix3x3_Double


__________________________________________________________________________________________________
;;
    GetDeterminantOfMatrix3x3_Double
   
    This function retrieves the determinant of a 3x3 matrix

    Parameters:
   
        pMatrix - The pointer to a 3x3 squared matrix used as input from where we want to find it´s determinant
                  The size of each element of the matrix must be a Double FPU data (Real8)
        pDeterminant - A pointer to a buffer to store the calculated determinant in Real8 data. The size of the buffer must be in
                   double FPU (Real8).

    Return Value:   If the function suceeds, it returns TRUE and also store the determinant value on the output buffer (pDeterminant).
                    If the function fails, it returns FALSE meaning that the square matrix is singular and also if you are using the function
                    to determine the inverse of a matrix, a FALSE result can also be used to confirm that the matrix can not be inverted.


;;

Proc GetDeterminantOfMatrix3x3_Double:
    Arguments @pMatrix, @pDeterminant
    Structure @DeterminantData 8, @DeterminantDataDis 0
    Uses esi, edi

    mov esi D@pMatrix

    fld R$esi+FloatMatrices.M1Dis
    fld R$esi+FloatMatrices.M5Dis | fld R$esi+FloatMatrices.M9Dis | fmulp ST1 ST0
    fld R$esi+FloatMatrices.M6Dis | fld R$esi+FloatMatrices.M8Dis | fmulp ST1 ST0
    fchs
    faddp ST1 ST0
    fmulp ST1 ST0
    fld R$esi+FloatMatrices.M2Dis
    fchs
    fld R$esi+FloatMatrices.M4Dis | fld R$esi+FloatMatrices.M9Dis | fmulp ST1 ST0
    fld R$esi+FloatMatrices.M6Dis | fld R$esi+FloatMatrices.M7Dis | fmulp ST1 ST0
    fchs
    faddp ST1 ST0
    fmulp ST1 ST0
    fld R$esi+FloatMatrices.M3Dis
    fld R$esi+FloatMatrices.M4Dis | fld R$esi+FloatMatrices.M8Dis | fmulp ST1 ST0
    fld R$esi+FloatMatrices.M5Dis | fld R$esi+FloatMatrices.M7Dis | fmulp ST1 ST0
    fchs
    faddp ST1 ST0
    fmulp ST1 ST0
    faddp ST1 ST0
    faddp ST1 ST0
    fstp R@DeterminantDataDis

    mov edi D@pDeterminant
    Fpu_If R@DeterminantDataDis = R$Float_Zero
        fldz | fstp R$edi ; make sure it is zero with any other rounding down value
        xor eax eax
    Fpu_Else
        fld R@DeterminantDataDis | fstp R$edi
        mov eax &TRUE
    Fpu_End_If

EndP


Fast_MatrixTranspose_Double
;;
    Fast_MatrixTranspose_Double
   
    This function transposes a marix with any size n*n
   
        Parameters:
       
            Input - A pointer to a matrix of any size (width and height) to be transposed.
                    The elements opf the matrix must be a Real8 (Double) each.
            Output - A pointer to a buffer to receive the transposed data.
                     The size of the buffer must be at least 128 bytes (4*4*8 = Width(4)*Height(4)*8(size of Real8FPU)).
                     Also, the buffer must be aligned to 32 bytes (4 Real8) so it can compute the resultant transpose
                     properly without affecting the data that are located after the end of the matrix.
                     In other words, If your matrix have a size of 5*5, it means that you will need a buffer of
                     200 Bytes (5*5*8) plus 24 extra bytes (3 Real8 Fpu data), since the internal computation of the transposition matrix
                     work from 4*4 data each. So, if the matrix size is not a multiple of you will need more Real8 Fpu on the buffer to complete.
                   
            Width - The width of the matrix (an integer)
            Height - The height of the matrix (an integer)


            Return Values: On exit the function will return the pointer to the start of the transposed matrix.
                           In other words, it will return the same start address of the buffer you settled on Output parameter
                           but will then contain the transposed matrix


            Example of usage:

            [Teste6x4:  R$  1,  2,  3,  4,  5, 6,
                        R$ 26,  7,  8,  9, 10,27
                        R$ 11, 12, 13, 14, 15,28
                        R$ 16, 17, 18, 19, 20,29]

            [MyMatrixBuffer: R$ 0 #(6*4)]; No need for padding buffers, since it is a multiple of 4

                call Fast_MatrixTranspose_Double Teste6x4, ebx, 6, 4



            [Teste5x5:  R$  1,  2,  3,  4,  5,
                        R$ 26,  7,  8,  9, 10,
                        R$ 11, 12, 13, 14, 15,
                        R$ 16, 17, 18, 19, 20,
                        R$ 46, 117, 4, 129, 23,]

            [MyMatrixBuffer: R$ 0 #(5*5)
             PaddingBuffer: R$ 0 #3] ; Extra 3 Real8 FPU to complete the buffer size

                call Fast_MatrixTranspose_Double Teste6x4, ebx, 6, 4


    Reference:
        http://masm32.com/board/index.php?topic=6105.15
;;

Proc Fast_MatrixTranspose_Double:
    Arguments @Input, @Output, @Width, @Height
    Local @CurXPos, @RemainderY, @MaxYPos
    Uses esi, edi, ebx, ecx, edx

    mov esi D@Input
    mov edi D@Output

    ; get remainders for edi
    mov D@RemainderY 0
    mov edx D@Height | mov ecx edx | shr edx 2 | mov D@MaxYPos edx | and ecx 3; Check if value (Height) is a multiple of 4
    jz L1>
        ; found not multiple of 4
        inc D@MaxYPos ; if height have a remainder (i mean, not multiple of 4, increment it)
        mov eax 4 | sub eax ecx | shl eax 3
        mov D@RemainderY eax
L1:

    mov eax D@Width |  mov D@CurXPos eax | shl eax 3 | lea ebx D$eax+eax*2 ; muylby 3; ebx = Width*4*3 . Width*8*3

L2:
    mov ecx D@MaxYPos
    mov edx esi
    Align 16 ; <---- Must be aligned to 16 to gain more speed and stability

    L8:
         ; copy the 1st 4 Qwords from esi to register XMM
        movq XMM1 Q$edx+eax
        movq XMM0 Q$edx
        pslldq xmm1 8
        xorps XMM0 XMM1
        movupd X$edi xmm0

        movq XMM1 Q$edx+ebx
        pslldq xmm1 8
        movq XMM0 Q$edx+eax*2
        xorps XMM0 XMM1
        movupd X$edi+16 xmm0

        lea edx D$edx+eax*4
        add edi (8*4) ; advance one xmm reg
        dec ecx | jg L8<

    sub edi D@RemainderY; adjust edi from the remainder in YPos only
    add esi 8
    dec D@CurXPos | jnz L2<<

    mov eax D@RemainderY
    ; clear remainder bytes if any
    test eax eax | jz L1>
        shr eax 3 | jz L1>  | mov D$edi 0 | mov D$edi+4 0
        dec eax | jz L1>    | mov D$edi+8 0 | mov D$edi+12 0
        dec eax | jz L1>    | mov D$edi+16 0 | mov D$edi+20 0

L1:

    mov eax D@Output

EndP


Extra Macros to compute the FPU comparitions


[Fpu_Do | C0:]
[Fpu_Loop_Until | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 C0<<]

[.Fpu_Do | C1:]
[.Fpu_Loop_Until | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 C1<<]


[Fpu_While | B0: | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 B1>>]
[Fpu_End_While | jmp B0<< | B1:]

[.Fpu_While | B2: | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 B3>>]
[.Fpu_End_While | jmp B2<< | B3:]





[Fpu_If | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R0>>]
[Fpu_Else_If | jmp R5>> | R0: | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R0>>]
[Fpu_Else | jmp R5>> | R0:]
[Fpu_End_If | R0: | R5:]

[.Fpu_If |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R1>>]
[.Fpu_Else_If | jmp R6>> | R1: |  fld #3 | fld #1| fcompp | fstsw ax | fwait | sahf | jn#2 R1>>]
[.Fpu_Else | jmp R6>> | R1:]
[.Fpu_End_If | R1: | R6:]

[..Fpu_If |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R2>>]
[..Fpu_Else_If | jmp R7>> | R2: |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R2>>]
[..Fpu_Else | jmp R7>> | R2:]
[..Fpu_End_If | R2: | R7:]

[...Fpu_If |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R3>>]
[...Fpu_Else_If | jmp R8>> | R3: |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R3>>]
[...Fpu_Else | jmp R8>> | R3:]
[...Fpu_End_If | R3: | R8:]

[Fpu_If_And    | Fpu_If #1 #2 #3    | #+3]
[.Fpu_If_And   | .Fpu_If #1 #2 #3   | #+3]
[..Fpu_If_And  | ..Fpu_If #1 #2 #3  | #+3]
[...Fpu_If_And | ...Fpu_If #1 #2 #3 | #+3]

[Fpu_Else_If_And    | Fpu_Else    | Fpu_If_And    #F>L]
[.Fpu_Else_If_And   | .Fpu_Else   | .Fpu_If_And   #F>L]
[..Fpu_Else_If_And  | ..Fpu_Else  | ..Fpu_If_And  #F>L]
[...Fpu_Else_If_And | ...Fpu_Else | ...Fpu_If_And #F>L]





FindWhiteRefEx can be found here:
http://masm32.com/board/index.php?topic=7640.0
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

daydreamer

great work Guga :t
just confused if I post here or in the other thread in orphanage?

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

guga

Thanks.

You can post here, if you will. I posted here because it is related to new function with RosAsm syntax and not a general question i was making about the Chromacity Adaptation technique ;)

I could give a try porting those to masm, but, it has been a long time since i last worked hard on masm syntax. It shouldn´t be a problem to port, though. RosAsm syntax is similar to Nasm, but porting it to masm, shouldn´t be a problem.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

daydreamer

what happened to this project?read about Sin City movie that partly colorized parts,a randomized colorize function for some movie to make it look Noir?
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

guga

Hi daydreamer

I´m working on it, but slowly. I had to give a break this past month due to some works i´m doing in my company. I´ll trturn to it as soon as possible.I´m finishing a huge list of movies to catalog and it is keeping me without any free time :(
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com