Recent Posts

Pages: [1] 2 3 ... 10
1
The Workshop / Re: Hi resolution Sleep
« Last post by Siekmanski on Today at 10:07:41 AM »
Yep, it's a bit of a problem with the timers in windows.
We have to accept it.  :biggrin:

At least we can delay the execution with ~0.00005 seconds which is 20 times shorter than invoke Sleep,1
It can be useful in high performance apps.
2
The Workshop / Re: SetCurrentConsoleFontEx
« Last post by hutch-- on Today at 10:05:52 AM »
MSD,

When you include the masm32rt.inc file you already have the processor model. The ".stack" setting is not used in win32, a 32 bit PE file sets the stack reserve and stack commit in the linker options.

CHANGE THIS

.386
.model flat,stdcall
.stack 4096
include c:\masm32\include\masm32rt.inc

TO

include c:\masm32\include\masm32rt.inc
.686p
.mmx
.xmm
3
The Workshop / Re: Hi resolution Sleep
« Last post by hutch-- on Today at 10:00:58 AM »
I think you will always have limitations due to ring 0 task switching.
4
The Workshop / Re: Hi resolution Sleep
« Last post by Siekmanski on Today at 08:19:04 AM »
IReferenceClock Sleep test.
The results are a bit disappointing for small resolutions....

Code: [Select]
Resolution: 1.0 second
0.999471650726559 seconds
1.000116704776725 seconds
1.000111326998264 seconds
1.000117542872069 seconds

Resolution: 0.1 seconds
0.099520679304213 seconds
0.099822114263126 seconds
0.099843834900804 seconds
0.099841250773492 seconds

Resolution: 0.01 seconds
0.009674832974582 seconds
0.009806623467508 seconds
0.010794737878697 seconds
0.009871436174151 seconds

Resolution: 0.001 seconds
0.000068374611857 seconds
0.000069771437431 seconds
0.000074450803105 seconds
0.000027447622533 seconds

Resolution: 0.0001 seconds
0.000060063499691 seconds
0.000067326992676 seconds
0.000069352389759 seconds
0.000068234929300 seconds

Resolution: 0.00001 seconds
0.000039320639914 seconds
0.000040088893980 seconds
0.000059853975854 seconds
0.000068095246742 seconds
5
RosAsm / Invert Matrix 3x3 for Real8 variables
« Last post by guga on Today at 07:15:43 AM »
This function computes the inverted 3x3 matrix formed by a Double FPU (Real8) data.

InvertMatrix_3x3_Double
Code: [Select]

;;
    InvertMatrix_3x3_Double
   
    This function generates the inversion of a 3x3 squared matrix

    Parameters:
        InputMatrix: A pointer to a 3x3 matrix used as input. The size of the element on each member
                     of the matrix must be a douuble (Real8) FPU data.

        OutputMatrix: A pointer to a buffer that wil receive the generated 3x3 matrix.
                      The size of the buffer must be 72 bytes (equivalent to 9 Real8 FPU variables), representing a 3x3 matrix formed by
                      double FPU. So, 8*3*3 = 72.

    Return Value:
        If the function suceeds it returns TRUE and the OutputMatrix parameter will hold the converted data.
        If it fails it returns FALSE meaning that the matrix can not be inverted. It happens when the computed determinant is 0.

    Remarks:
        A square matrix that is not invertible is called singular or degenerate.
        A square matrix is singular if and only if its determinant is 0.


    References: https://en.wikipedia.org/wiki/Invertible_matrix

    Example of usage:

    [TestMatrix1:
     Test_M1: R$ 0.8951   Test_M2: R$ 0.2664   Test_M3: R$ -0.1614
     Test_M4: R$ -0.7502  Test_M5: R$ 1.7135   Test_M6: R$ 0.0367
     Test_M7: R$ 0.0389   Test_M8: R$ -0.0685  Test_M9: R$ 1.0296]

    [TestMatrix2: R$ 0 #9]

        call InvertMatrix_3x3_Double TestMatrix1, TestMatrix2

;;


Proc InvertMatrix_3x3_Double:
    Arguments @InputMatrix, @OutputMatrix
    Structure @TempStruct 264, @TmpDeterminantDis 0, @TmpOutMatrix1Dis 8, @TmpOutMatrix2Dis 136
    Uses edi, esi, ecx, ebx


    finit

    mov esi D@InputMatrix
    lea edi D@TmpOutMatrix1Dis

;;
[FloatMatrices.M1Dis (a)   FloatMatrices.M2Dis (b) FloatMatrices.M3Dis (c)
 FloatMatrices.M4Dis (d)   FloatMatrices.M5Dis (e) FloatMatrices.M6Dis (f)
 FloatMatrices.M7Dis (g)   FloatMatrices.M8Dis (h) FloatMatrices.M9Dis] (i)

;;

    ; A = ei-fh
    fld R$esi+FloatMatrices.M5Dis | fmul R$esi+FloatMatrices.M9Dis
    fld R$esi+FloatMatrices.M6Dis | fmul R$esi+FloatMatrices.M8Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M1Dis

    ; B = -(di-fg) = fg-di
    fld R$esi+FloatMatrices.M6Dis | fmul R$esi+FloatMatrices.M7Dis
    fld R$esi+FloatMatrices.M4Dis | fmul R$esi+FloatMatrices.M9Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M2Dis

    ; C = dh-eg
    fld R$esi+FloatMatrices.M4Dis | fmul R$esi+FloatMatrices.M8Dis
    fld R$esi+FloatMatrices.M5Dis | fmul R$esi+FloatMatrices.M7Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M3Dis

    ; D = -(bi-ch) = ch-bi
    fld R$esi+FloatMatrices.M3Dis | fmul R$esi+FloatMatrices.M8Dis
    fld R$esi+FloatMatrices.M2Dis | fmul R$esi+FloatMatrices.M9Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M4Dis

    ; E =  ai-cg
    fld R$esi+FloatMatrices.M1Dis | fmul R$esi+FloatMatrices.M9Dis
    fld R$esi+FloatMatrices.M3Dis | fmul R$esi+FloatMatrices.M7Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M5Dis

    ; F = -(ah-bg) = bg-ah
    fld R$esi+FloatMatrices.M2Dis | fmul R$esi+FloatMatrices.M7Dis
    fld R$esi+FloatMatrices.M1Dis | fmul R$esi+FloatMatrices.M8Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M6Dis

    ; G = bf-ce
    fld R$esi+FloatMatrices.M2Dis | fmul R$esi+FloatMatrices.M6Dis
    fld R$esi+FloatMatrices.M3Dis | fmul R$esi+FloatMatrices.M5Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M7Dis

    ; H =  -(af-cd) = cd-af
    fld R$esi+FloatMatrices.M3Dis | fmul R$esi+FloatMatrices.M4Dis
    fld R$esi+FloatMatrices.M1Dis | fmul R$esi+FloatMatrices.M6Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M8Dis

    ; I = ae-bd
    fld R$esi+FloatMatrices.M1Dis | fmul R$esi+FloatMatrices.M5Dis
    fld R$esi+FloatMatrices.M2Dis | fmul R$esi+FloatMatrices.M4Dis | fchs | faddp ST1 ST0
    fstp R$edi+FloatMatrices.M9Dis

    ; get determinant
    lea eax D@TmpDeterminantDis
    call GetDeterminantOfMatrix3x3_Double D@InputMatrix, eax
    On eax = 0, ExitP ; Matrix not invertible
    fld1 | fdiv R@TmpDeterminantDis | fstp R@TmpDeterminantDis

    lea ebx D@TmpOutMatrix2Dis
    call Fast_MatrixTranspose_Double edi, ebx, 3, 3

    mov edi D@OutputMatrix
    xor ecx ecx
    Do
        fld R$ebx+ecx*8 | fmul R@TmpDeterminantDis | fstp R$edi+ecx*8
        inc ecx
    Loop_Until ecx => 9

    mov eax &TRUE

EndP

Aditional functions:

GetDeterminantOfMatrix3x3_Double
Code: [Select]
__________________________________________________________________________________________________
;;
    GetDeterminantOfMatrix3x3_Double
   
    This function retrieves the determinant of a 3x3 matrix

    Parameters:
   
        pMatrix - The pointer to a 3x3 squared matrix used as input from where we want to find it´s determinant
                  The size of each element of the matrix must be a Double FPU data (Real8)
        pDeterminant - A pointer to a buffer to store the calculated determinant in Real8 data. The size of the buffer must be in
                   double FPU (Real8).

    Return Value:   If the function suceeds, it returns TRUE and also store the determinant value on the output buffer (pDeterminant).
                    If the function fails, it returns FALSE meaning that the square matrix is singular and also if you are using the function
                    to determine the inverse of a matrix, a FALSE result can also be used to confirm that the matrix can not be inverted.


;;

Proc GetDeterminantOfMatrix3x3_Double:
    Arguments @pMatrix, @pDeterminant
    Structure @DeterminantData 8, @DeterminantDataDis 0
    Uses esi, edi

    mov esi D@pMatrix

    fld R$esi+FloatMatrices.M1Dis
    fld R$esi+FloatMatrices.M5Dis | fld R$esi+FloatMatrices.M9Dis | fmulp ST1 ST0
    fld R$esi+FloatMatrices.M6Dis | fld R$esi+FloatMatrices.M8Dis | fmulp ST1 ST0
    fchs
    faddp ST1 ST0
    fmulp ST1 ST0
    fld R$esi+FloatMatrices.M2Dis
    fchs
    fld R$esi+FloatMatrices.M4Dis | fld R$esi+FloatMatrices.M9Dis | fmulp ST1 ST0
    fld R$esi+FloatMatrices.M6Dis | fld R$esi+FloatMatrices.M7Dis | fmulp ST1 ST0
    fchs
    faddp ST1 ST0
    fmulp ST1 ST0
    fld R$esi+FloatMatrices.M3Dis
    fld R$esi+FloatMatrices.M4Dis | fld R$esi+FloatMatrices.M8Dis | fmulp ST1 ST0
    fld R$esi+FloatMatrices.M5Dis | fld R$esi+FloatMatrices.M7Dis | fmulp ST1 ST0
    fchs
    faddp ST1 ST0
    fmulp ST1 ST0
    faddp ST1 ST0
    faddp ST1 ST0
    fstp R@DeterminantDataDis

    mov edi D@pDeterminant
    Fpu_If R@DeterminantDataDis = R$Float_Zero
        fldz | fstp R$edi ; make sure it is zero with any other rounding down value
        xor eax eax
    Fpu_Else
        fld R@DeterminantDataDis | fstp R$edi
        mov eax &TRUE
    Fpu_End_If

EndP

Fast_MatrixTranspose_Double
Code: [Select]

;;
    Fast_MatrixTranspose_Double
   
    This function transposes a marix with any size n*n
   
        Parameters:
       
            Input - A pointer to a matrix of any size (width and height) to be transposed.
                    The elements opf the matrix must be a Real8 (Double) each.
            Output - A pointer to a buffer to receive the transposed data.
                     The size of the buffer must be at least 128 bytes (4*4*8 = Width(4)*Height(4)*8(size of Real8FPU)).
                     Also, the buffer must be aligned to 32 bytes (4 Real8) so it can compute the resultant transpose
                     properly without affecting the data that are located after the end of the matrix.
                     In other words, If your matrix have a size of 5*5, it means that you will need a buffer of
                     200 Bytes (5*5*8) plus 24 extra bytes (3 Real8 Fpu data), since the internal computation of the transposition matrix
                     work from 4*4 data each. So, if the matrix size is not a multiple of you will need more Real8 Fpu on the buffer to complete.
                   
            Width - The width of the matrix (an integer)
            Height - The height of the matrix (an integer)


            Return Values: On exit the function will return the pointer to the start of the transposed matrix.
                           In other words, it will return the same start address of the buffer you settled on Output parameter
                           but will then contain the transposed matrix


            Example of usage:

            [Teste6x4:  R$  1,  2,  3,  4,  5, 6,
                        R$ 26,  7,  8,  9, 10,27
                        R$ 11, 12, 13, 14, 15,28
                        R$ 16, 17, 18, 19, 20,29]

            [MyMatrixBuffer: R$ 0 #(6*4)]; No need for padding buffers, since it is a multiple of 4

                call Fast_MatrixTranspose_Double Teste6x4, ebx, 6, 4



            [Teste5x5:  R$  1,  2,  3,  4,  5,
                        R$ 26,  7,  8,  9, 10,
                        R$ 11, 12, 13, 14, 15,
                        R$ 16, 17, 18, 19, 20,
                        R$ 46, 117, 4, 129, 23,]

            [MyMatrixBuffer: R$ 0 #(5*5)
             PaddingBuffer: R$ 0 #3] ; Extra 3 Real8 FPU to complete the buffer size

                call Fast_MatrixTranspose_Double Teste6x4, ebx, 6, 4


    Reference:
        http://masm32.com/board/index.php?topic=6105.15
;;

Proc Fast_MatrixTranspose_Double:
    Arguments @Input, @Output, @Width, @Height
    Local @CurXPos, @RemainderY, @MaxYPos
    Uses esi, edi, ebx, ecx, edx

    mov esi D@Input
    mov edi D@Output

    ; get remainders for edi
    mov D@RemainderY 0
    mov edx D@Height | mov ecx edx | shr edx 2 | mov D@MaxYPos edx | and ecx 3; Check if value (Height) is a multiple of 4
    jz L1>
        ; found not multiple of 4
        inc D@MaxYPos ; if height have a remainder (i mean, not multiple of 4, increment it)
        mov eax 4 | sub eax ecx | shl eax 3
        mov D@RemainderY eax
L1:

    mov eax D@Width |  mov D@CurXPos eax | shl eax 3 | lea ebx D$eax+eax*2 ; muylby 3; ebx = Width*4*3 . Width*8*3

L2:
    mov ecx D@MaxYPos
    mov edx esi
    Align 16 ; <---- Must be aligned to 16 to gain more speed and stability

    L8:
         ; copy the 1st 4 Qwords from esi to register XMM
        movq XMM1 Q$edx+eax
        movq XMM0 Q$edx
        pslldq xmm1 8
        xorps XMM0 XMM1
        movupd X$edi xmm0

        movq XMM1 Q$edx+ebx
        pslldq xmm1 8
        movq XMM0 Q$edx+eax*2
        xorps XMM0 XMM1
        movupd X$edi+16 xmm0

        lea edx D$edx+eax*4
        add edi (8*4) ; advance one xmm reg
        dec ecx | jg L8<

    sub edi D@RemainderY; adjust edi from the remainder in YPos only
    add esi 8
    dec D@CurXPos | jnz L2<<

    mov eax D@RemainderY
    ; clear remainder bytes if any
    test eax eax | jz L1>
        shr eax 3 | jz L1>  | mov D$edi 0 | mov D$edi+4 0
        dec eax | jz L1>    | mov D$edi+8 0 | mov D$edi+12 0
        dec eax | jz L1>    | mov D$edi+16 0 | mov D$edi+20 0

L1:

    mov eax D@Output

EndP

Extra Macros to compute the FPU comparitions

Code: [Select]


[Fpu_Do | C0:]
[Fpu_Loop_Until | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 C0<<]

[.Fpu_Do | C1:]
[.Fpu_Loop_Until | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 C1<<]


[Fpu_While | B0: | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 B1>>]
[Fpu_End_While | jmp B0<< | B1:]

[.Fpu_While | B2: | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 B3>>]
[.Fpu_End_While | jmp B2<< | B3:]





[Fpu_If | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R0>>]
[Fpu_Else_If | jmp R5>> | R0: | fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R0>>]
[Fpu_Else | jmp R5>> | R0:]
[Fpu_End_If | R0: | R5:]

[.Fpu_If |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R1>>]
[.Fpu_Else_If | jmp R6>> | R1: |  fld #3 | fld #1| fcompp | fstsw ax | fwait | sahf | jn#2 R1>>]
[.Fpu_Else | jmp R6>> | R1:]
[.Fpu_End_If | R1: | R6:]

[..Fpu_If |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R2>>]
[..Fpu_Else_If | jmp R7>> | R2: |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R2>>]
[..Fpu_Else | jmp R7>> | R2:]
[..Fpu_End_If | R2: | R7:]

[...Fpu_If |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R3>>]
[...Fpu_Else_If | jmp R8>> | R3: |  fld #3 | fld #1 | fcompp | fstsw ax | fwait | sahf | jn#2 R3>>]
[...Fpu_Else | jmp R8>> | R3:]
[...Fpu_End_If | R3: | R8:]

[Fpu_If_And    | Fpu_If #1 #2 #3    | #+3]
[.Fpu_If_And   | .Fpu_If #1 #2 #3   | #+3]
[..Fpu_If_And  | ..Fpu_If #1 #2 #3  | #+3]
[...Fpu_If_And | ...Fpu_If #1 #2 #3 | #+3]

[Fpu_Else_If_And    | Fpu_Else    | Fpu_If_And    #F>L]
[.Fpu_Else_If_And   | .Fpu_Else   | .Fpu_If_And   #F>L]
[..Fpu_Else_If_And  | ..Fpu_Else  | ..Fpu_If_And  #F>L]
[...Fpu_Else_If_And | ...Fpu_Else | ...Fpu_If_And #F>L]


Many tks to Rui, JJ and Siekmanski :t :t :t :t :t


Btw....If someone have a faster code that also can invert a squared matrix on any size, please post :) (And if you guys have, please post both versions, one using Real8 and other using Real4 Fpu ;) )
6
The Workshop / Re: Hi resolution Sleep
« Last post by TimoVJL on Today at 06:01:07 AM »
Silly small test project with PDB for debugging.
Code: [Select]
int main(void)
{
    CoInitialize( NULL );
    IReferenceClock* pRefClock;
    HRESULT hr = CoCreateInstance(&CLSID_SystemClock, NULL, CLSCTX_INPROC_SERVER, &IID_IReferenceClock, (void**)&pRefClock );
if (pRefClock) {
long long llTime1, llTime2;
DWORD dwCookie;
hr = pRefClock->lpVtbl->GetTime(pRefClock, &llTime1);
HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, TEXT("MyEvent"));
hr = pRefClock->lpVtbl->AdviseTime(pRefClock, llTime1, 10000000, hEvent, &dwCookie);
if (!hr) {
printf("wait...");
WaitForSingleObject(hEvent, INFINITE);
} else printf("error");
hr = pRefClock->lpVtbl->GetTime(pRefClock, &llTime2);
printf("time: %lld\n", llTime2 - llTime1);
    pRefClock->lpVtbl->Release(pRefClock);
}
    CoUninitialize();
    return 0;
}
7
Assembler/Compiler Technology / Re: My New ASM Code Editor
« Last post by i Z ! on Today at 03:45:57 AM »
, with plenty of screenshots added

oh.. Sorry, the screenshots are not yet visible on the MS Store listing. Let's give it a few more hours..
8
The Workshop / Re: Hi resolution Sleep
« Last post by Siekmanski on January 23, 2019, 11:46:35 PM »
Don't know exactly, worth investigating.  :biggrin:
I'll write something tonight so we can test it further.
9
The Workshop / Re: Hi resolution Sleep
« Last post by jj2007 on January 23, 2019, 11:28:04 PM »
It looks more like a timed stalling than a timed sleep....

Yes, it's more a joke.

Quote
IReferenceClock::GetTime
IReferenceClock::AdviseTime

https://docs.microsoft.com/en-us/windows/desktop/wmformat/ireferenceclock

It's an event driven 100-nanosecond units resolution timer.

Looks interesting, but I wouldn't be surprised to find NtDelayExecution under the hood. And that one is already used in
Code: [Select]
Delay until fTime$(s:1, "HH:mm:ss.123") ; use time$(current plus one second).
10
Assembler/Compiler Technology / Re: My New ASM Code Editor
« Last post by i Z ! on January 23, 2019, 09:04:33 PM »
A Asm editor for 39,99 €  , I do not buy the cat in the bag!  :biggrin:
Without test it like trial version.

There is a 30-day trial. You should be able to access it by clicking the dropdown arrow on the Buy button.

Edit: Thanks for looking into it and for pointing this out, so other users will know, 'cause it is indeed a bit hidden.
Pages: [1] 2 3 ... 10