News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

testp

Started by mineiro, August 24, 2023, 07:22:42 AM

Previous topic - Next topic

mineiro

These are my steps to compile testp and run it, tested in windows 8.
-----------------------------------------------------------------------------------
Search in internet for "visual studio 2022 download".
Link bellow refers to brazilian portuguese language.
https://visualstudio.microsoft.com/pt-br/downloads/
Download it, install it. Take hours, be patience.

Go to link bellow:
https://www.agner.org/optimize/
Download testp.zip

Extract testp.zip in c:\
If asked for permissions, insert your windows admin user password to extract zip file.
Extract PMCTest.zip, a folder will be created.


I'm using window 8. We need disable driver signature to run the drivers suplied with pmctest. You need search how to do this using your windows version (generally some of then need that you press key "F8" at startup).
https://soldered.com/learn/disabling-driver-signature-for-windows-8/


Now, we need launch command prompt with admin rights. Right mouse button click over command prompt, a listview will appear, select "run as admin". Insert a password.

Inside command prompt windows type:
cd \
cd testp\pmctest
startcounters.bat

If everything was fine, you should have seen something like:
Enabled 4 counters in each of 4 CPU cores
PMC number:     Counter name:
0x40000001      Core cyc
0x40000000      Instruct
0x00000000      Uops
0x00000001      L1D Miss

So, if you see in command prompt something like that it's working.

You can check file make_a_obj.bat inside pmctest folder and skip text below.

------------------------------------------------------------------------
Configuring our environment:
If you type ml64 in command prompt and receive an error message so you need do this step:
Open windows explorer, select "C:\", and search for "ml64.exe". We have interest where this file sits, in their folder (directory).
One option is press "control+f" key with keyboard and type "ml64.exe".
In my machine was found in:
C:\Program Files\Microsoft Visual Studio\2022\Community\Vc\Tools\MSVC\14.36.32532\bin\Hostx64\x64

Copy that path found, switch to command prompt ,type:
PATH=%PATH%;
Now you can paste selected folder string to command prompt, the end result will be:
PATH=%PATH%;C:\Program Files\Microsoft Visual Studio\2022\Community\Vc\Tools\MSVC\14.36.32532\bin\Hostx64\x64

Now if you type ml64.exe you should receive as an answer that program was found.


---------------------------------------------------------------------------------
Now, it's necessary to change PMCTestB64.asm file to our measures.
Open PMCTestB64.asm  with your text editor, I'm using notepad. By default, this file comes configured to windows, instead of linux.

In file PMCTestB64.asm search for "rept 100", and you will see the lines bellow:

rept 100        ;example: 100 shift instructions
shr eax,5
endm

Now we need comment shr instruction and insert our.

rept 100        ;example: 100 shift instructions
mov rax,123
;shr eax,5
endm

Save file.

Switch to command prompt again, we will assemble and link.
type:
c:\testp\PMCTest>ml64 /c /Cx /W3 /Zi PMCTestB64.asm

An error message appears to me:
PMCTestB64.asm(xxx) : error A2008: syntax error :MACRO
PMCTestB64.asm(xxx) : fatal error A1008:unmatched macro nesting

So, this error (that is not an error) can be solved by searching for all "SERIALIZE" strings found in text and changing to other name, I choose "SERIALIZE1".
So, in notepad, click in "search and replace", insert names and "replace all". Save file and try again command line typed before.

Ok, we was able to assemble PMCTestB64.asm file.
Now, lets link this file with the one object file provided in the same folder.
type:
link /subsystem:console /out:movrax123.exe PMCTestA64.obj PMCTestB64.obj

I received an error after trying link:
LINK: fatal error LNK1104: was not able to open "uuid.lib".
----------------------------------------------------------------------------------------
So, we need locate where uuid.lib sits. Go to explorer, search for "uuid.lib", in my machine was found in:
C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22000.0\um\x64

Switch to command prompt and type:
set LIB=C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22000.0\um\x64

now, let's try to link again:
link /subsystem:console /out:movrax123.exe PMCTestA64.obj PMCTestB64.obj

Well, another library not found in path, this time was "LIBCMT.lib", so, again, let's find this library in computer, and was found in:
;C:\Arquivos de Programas\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\lib\onecore\x64
C:\Arquivos de Programas\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\lib\x64

Again, let's insert (append) this path into LIB variable, but perceive that PATHS are separated by ";" :
set LIB=C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22000.0\um\x64;C:\Arquivos de Programas\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\lib\x64

Now, let's try to link again:
link /subsystem:console /out:movrax123.exe PMCTestA64.obj PMCTestB64.obj

And a library was not found, this time was "libucrt.lib", again, lets search for this library and was found in:
C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22000.0\ucrt\x64
Let's append this path to LIB variable, not forgeting to separate with ";" previous paths:
set LIB=C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22000.0\um\x64;C:\Arquivos de Programas\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\lib\x64;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22000.0\ucrt\x64

Again, let's try to link program:
link /subsystem:console /out:movrax123.exe PMCTestA64.obj PMCTestB64.obj

Well, other problems, external symbols not found like __imp_CloseServiceHandle, __imp_CreateServiceA, __imp_OpenSCManagerA,... .:
So, after a lot of possible tries, I realized that I should have inserted one path found instead of other:

Instead of being path bellow:
C:\Arquivos de Programas\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\lib\x64
should be the path bellow:
C:\Arquivos de Programas\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\lib\onecore\x64

So, lets set LIB variable again:
set LIB=C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22000.0\um\x64;C:\Arquivos de Programas\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\lib\onecore\x64;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22000.0\ucrt\x64

And again, try to link:
link /subsystem:console /out:movrax123.exe PMCTestA64.obj PMCTestB64.obj

Nice, file movrax123.exe found, let's execute, it's working.

So, let's save paths into a file so we don't need type this everytime.
Switch to explorer.exe, go to folder c:\testp\pmctest
Create a file "configure.bat"
Insert inside this text file the lines bellow:

-----------------------------------------------------------------
PATH=%PATH%;C:\Program Files\Microsoft Visual Studio\2022\Community\Vc\Tools\MSVC\14.36.32532\bin\Hostx64\x64
set LIB=C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22000.0\um\x64;C:\Arquivos de Programas\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\lib\onecore\x64;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22000.0\ucrt\x64
ml64 /c /Cx /W3 /Zi PMCTestB64.asm
link /subsystem:console /out:x.exe PMCTestA64.obj PMCTestB64.obj
x.exe
PAUSE
-----------------------------------------------------------------

Now, when need, you right click with mouse button into file configure.bat, select open as admin, and program will be assembled and linked, result will be show in screen.


I'd rather be this ambulant metamorphosis than to have that old opinion about everything

mineiro

Compiling testp using linux.
I'm using linux mint. Gcc is installed.
---------------------------------
download testp.zip from site below:
https://www.agner.org/optimize/
Extract to your home directory, enter testp directory and extract PMCTest.zip, extract DriverSrcLinux.zip:

download uasm from one of sites below, select linux version:
https://www.terraspace.co.uk/uasm.html
https://github.com/Terraspace/UASM
https://github.com/Terraspace/UASM/releases

If you downloaded uasm source code from github, you can compile it. Extract uasm-2.56.2.zip in home folder.
Open a bash, konsole, xterm, something that you prefer, enter uasm-2.56.2 folder.
type:
make -f gccLinux64.mak
After compiling, executable can be found inside GccUnixR folder. Copy uasm executable to your homedir/testp/PMCTest or copy that to /usr/bin folder, permissions are need.

switch to ~/testp/DriverSrcLinux
type:
sudo ./install.sh
Command above will compile and install module (driver).

switch to ~/testp/PMCTest directory
edit file a64.sh because we will use uasm assembler.

a64.sh
-------------------------------------------
#!/bin/bash
#compile and run PMCTest in 64 bit mode with yasm assembly syntax

# Compile A file if modified
if [ PMCTestA.cpp -nt a64.o ] ; then
g++ -O2 -c -m64 -oa64.o PMCTestA.cpp
fi

uasm -elf64 -DWINDOWS=0 -less -nologo PMCTestB64.asm
if [ $? -ne 0 ] ; then exit ; fi

g++ a64.o PMCTestB64.o -ox -lpthread -z noexecstack
if [ $? -ne 0 ] ; then exit ; fi

./x
#---------------------------------------------------------

After executing a64.sh you should be able to see an output.
If you receive an error, so;

This happened to me because WINDOWS definition inside file PMCTestB64.asm was set to 1.
So, open PMCTestB64.asm, search for string below:
WINDOWS  EQU    1
and change to:
WINDOWS  EQU    0

So, if you type again ./a64.sh you will see and output like this:

Cannot make counter 4. No matching counter definition found

     Clock   Core cyc   Instruct       Uops
       300        244        244        244
        94         77         77         77
        96         79         79         79
        96         79         79         79
        96         78         78         78
        96         76         76         76
       100         77         77         77
        92         76         76         76
        96         78         78         78
        98         77         77         77
        94         78         78         78
        98         77         77         77

       
Nice, it's working.
So, lets change file PMCTestB64.asm and insert our needs.
Open your text editor program, search for lines below:
rept 100        ; example: 100 shift instructions
shr eax,5
endm

comment shr, and change to:
rept 1000        ; example: 100 shift instructions
mov rax,123
;shr eax,5
endm

Save file, execute ./a64.sh

Now, let's measure mov eax,123
rept 1000        ; example: 100 shift instructions
mov eax,123
;mov rax,123
;shr eax,5
endm

I'd rather be this ambulant metamorphosis than to have that old opinion about everything

mineiro

File bellow have some configurations to some processors, you need check what can work in yours:
If you try this under windows, change:
WINDOWS  EQU    0
to
WINDOWS  EQU    1

comment & ---------------------------------------------------------------------
                          PMCTestB64.asm              © 2013-08-20 Agner Fog

                PMC Test program for multiple threads

This program is intended for testing the performance of a little piece of
code written in assembly language.
The code to test is inserted at the place marked "Test code start".
All sections that can be modified by the user are marked with ###########.

The code to test will be executed REPETITIONS times and the test results
will be output for each repetition. This program measures how many clock
cycles the code to test takes in each repetition. Furthermore, it is
possible to set a number of Performance Monitor Counters (PMC) to count
the number of micro-operations (uops), cache misses, branch mispredictions,
etc.

The setup of the Performance Monitor Counters is microprocessor-specific.
The specifications for PMC setup for each microprocessor family is defined
in the tables CounterDefinitions and CounterTypesDesired.

See PMCTest.txt for instructions.

© 2000-2013 GNU General Public License www.gnu.org/licenses

----------------------------------------------------------------------------- &

; Operating system: 0 = Linux, 1 = Windows
WINDOWS  EQU    0

; Define whether AVX and YMM registers used
USEAVX        = 0

; Define cache line size (to avoid threads sharing cache lines):
CACHELINESIZE = 64

DATA SEGMENT ALIGN(CACHELINESIZE)

;##############################################################################
;#
;#            List of desired counter types and other user definitions
;#
;##############################################################################
 
; Here you can select which performance monitor counters you want for your test.
; Select id numbers from the table CounterDefinitions[] in PMCTestA.cpp.

USE_PERFORMANCE_COUNTERS  equ  1        ; Tell if you are using performance counters

CounterTypesDesired label DWORD
    dd      0       
    DD      1        ; core cycles (Intel only)   // core clock cycles
;dd 2                ;Ref cyc                     // Reference clock cycles
;DD 9                ;instructions                // Instructions (reference counter)
;dd 10               ;instructions                // Instructions
;dd 22               ;ILenStal                    // instruction length decoder stall due to length changing prefix
;dd 24               ;Loop uops                   // uops from loop stream detector
;dd 25               ;Dec uops                    // uops from decoders. (MITE = Micro-instruction Translation Engine)
;dd 26               ;Cach uops                   // uops from uop cache. (DSB = Decoded Stream Buffer)
;DD 100              ;uops                        // uops retired, unfused domain
;dd 104              ;uops RAT                    // uops from RAT to RS
;dd 111              ;res.stl                     // any resource stall
;dd 131              ;AVX trans                   // VEX - non-VEX transition penalties
;dd 150              ;uop p0                      // uops port 0.
;dd 151              ;uop p1                      // uops port 1.
;dd 152              ;uop p2                      // uops port 2.
;dd 153              ;uop p3                      // uops port 3.
;dd 154              ;uop p4                      // uops port 4.
;dd 155              ;uop p5                      // uops port 5.
;dd 156              ;uop p6                      // uops port 6.
;dd 157              ;uop p7                      // uops port 7.
;dd 160              ;uop p07                     // uops port 0-7
;dd 201              ;BrTaken                     // branches taken
;dd 207              ;BrMispred                   // mispredicted branches
;dd 220              ;Mov elim                    // register moves eliminated
;dd 221              ;Mov elim-                   // register moves elimination unsuccessful
dd 310              ;Codemiss                    // code cache misses
dd 311              ;L1D Miss                    // level 1 data cache miss
dd 320              ;L2 Miss                     // level 2 cache misses
;    DD    101        ; data cache misses

   
; Number of counters defined
IF USE_PERFORMANCE_COUNTERS
NUM_COUNTERS = ($ - CounterTypesDesired) / 4
ELSE
NUM_COUNTERS = 0
ENDIF

; Number of repetitions of test.
REPETITIONS = 12

; Number of threads
NUM_THREADS = 1

; Subtract overhead from clock counts (0 if not)
SUBTRACT_OVERHEAD = 1

; Number of repetitions in loop to find overhead
OVERHEAD_REPETITIONS = 4

; Maximum number of PMC counters
MAXCOUNTERS = 6              ; must match value in PMCTest.h

IF NUM_COUNTERS GT MAXCOUNTERS
   NUM_COUNTERS = MAXCOUNTERS
ENDIF

; Define array sizes
MAXREPEAT = REPETITIONS

;------------------------------------------------------------------------------
;
;                  global data
;
;------------------------------------------------------------------------------

public NumCounters, MaxNumCounters, EventRegistersUsed
public UsePMC, Counters, CounterTypesDesired
public PThreadData, ClockResultsOS, PMCResultsOS, NumThreads, ThreadDataSize
public RatioOut, TempOut, RatioOutTitle, TempOutTitle


; Per-thread data:
ALIGN   CACHELINESIZE
; Data for first thread
ThreadData label dword                                     ; beginning of thread data block
CountTemp        DD    MAXCOUNTERS + 1            dup (0)  ; temporary storage of counts
CountOverhead    DD    MAXCOUNTERS + 1            dup (-1) ; temporary storage of count overhead
ClockResults     DD    REPETITIONS                dup (0)  ; clock counts
PMCResults       DD    REPETITIONS * MAXCOUNTERS  dup (0)  ; PMC counts
align 8
RSPSave          DQ    0                                 ; save stack pointer
ALIGN   CACHELINESIZE                  ; Make sure threads don't use same cache lines
THREADDSIZE = (offset $ - offset ThreadData)          ; size of data block for each thread

; Define data blocks of same size for remaining threads
IF NUM_THREADS GT 1
DB (NUM_THREADS - 1) * THREADDSIZE DUP (0)
ENDIF

; Global data
PThreadData     DQ    ThreadData               ; Pointer to measured data for all threads
NumCounters     DD    0                        ; Will be number of valid counters
MaxNumCounters  DD    NUM_COUNTERS             ; Tell PMCTestA.CPP length of CounterTypesDesired
UsePMC          DD    USE_PERFORMANCE_COUNTERS ; Tell PMCTestA.CPP if RDPMC used. Driver needed
NumThreads      DD    NUM_THREADS              ; Number of threads
ThreadDataSize  DD    THREADDSIZE              ; Size of each thread data block
ClockResultsOS  DD    ClockResults-ThreadData  ; Offset to ClockResults
PMCResultsOS    DD    PMCResults-ThreadData    ; Offset to PMCResults
Counters        DD    MAXCOUNTERS dup (0)      ; Counter register numbers used will be inserted here
EventRegistersUsed DD MAXCOUNTERS dup (0)      ; Set by MTMonA.cpp


; optional extra output column definitions
RatioOut      DD   0, 0, 0, 0                ; optional ratio output. Se PMCTest.h
TempOut       DD   0, 0                      ; optional arbitrary output. Se PMCTest.h
RatioOutTitle DQ   0                         ; optional column heading
TempOutTitle  DQ   0                         ; optional column heading


;##############################################################################
;#
;#                 User data
;#
;##############################################################################
ALIGN   CACHELINESIZE

; Put any data definitions your test code needs here

d0 label dword
q0 label qword
UserData         DD    1000H dup (0)


;------------------------------------------------------------------------------
;
;                  Macro definitions
;
;------------------------------------------------------------------------------

SERIALIZE MACRO             ; serialize CPU
       xor     eax, eax
       cpuid
ENDM

CLEARXMMREG MACRO N         ; set xmm(N) register to 0
        pxor xmm&N, xmm&N
ENDM

CLEARALLXMMREG MACRO        ; set all xmm registers to 0
IF  USEAVX
        VZEROALL            ; clear all ymm registers
ELSE       
I = 0
REPT 16
        CLEARXMMREG %I      ; clear all xmm registers
I = I + 1
ENDM
ENDIF
ENDM

;------------------------------------------------------------------------------
;
;                  Test Loop
;
;------------------------------------------------------------------------------
.code

;extern "C" int TestLoop (int thread) {
; This function runs the code to test REPETITIONS times
; and reads the counters before and after each run:

TestLoop PROC
        push    rbx
        push    rbp
        push    r12
        push    r13
        push    r14
        push    r15
IF      WINDOWS                    ; These registers must be saved in Windows, not in Linux
        push    rsi
        push    rdi
        sub     rsp, 0A8H           ; Space for saving xmm6 - 15 and align
        movaps  [rsp], xmm6
        movaps  [rsp+10H], xmm7
        movaps  [rsp+20H], xmm8
        movaps  [rsp+30H], xmm9
        movaps  [rsp+40H], xmm10
        movaps  [rsp+50H], xmm11
        movaps  [rsp+60H], xmm12
        movaps  [rsp+70H], xmm13
        movaps  [rsp+80H], xmm14
        movaps  [rsp+90H], xmm15       
        mov     r15d, ecx          ; Thread number
ELSE    ; Linux
        mov     r15d, edi          ; Thread number
ENDIF
       
; Register use:
;   r13: pointer to thread data block
;   r14: loop counter
;   r15: thread number
;   rax, rbx, rcx, rdx: scratch
;   all other registers: available to user program


;##############################################################################
;#
;#                 User Initializations
;#
;##############################################################################
; You may add any initializations your test code needs here.
; Registers esi, edi, ebp and r8 - r12 will be unchanged from here to the
; Test code start.
;

        finit                ; clear all FP registers
       
        CLEARALLXMMREG       ; clear all xmm or ymm registers

        lea rsi, d0
        lea rdi,[rsi+120h]
        xor ebp,ebp
       

;##############################################################################
;#
;#                 End of user Initializations
;#
;##############################################################################

        lea     r13, [ThreadData]             ; address of first thread data block
        ;imul    eax, r15d, THREADDSIZE       ; offset to thread data block
        DB      41H, 69H, 0C7H                ; fix bug in ml64
        DD      THREADDSIZE
        add     r13, rax                      ; address of current thread data block
        mov     [r13+(RSPSave-ThreadData)],rsp ; save stack pointer

IF  SUBTRACT_OVERHEAD
; First test loop. Measure empty code
        xor     r14d, r14d                    ; Loop counter

TEST_LOOP_1:

        SERIALIZE
     
        ; Read counters
        I = 0
REPT    NUM_COUNTERS
        mov     ecx, [Counters + I*4]
        rdpmc
        mov     [r13 + I*4 + 4 + (CountTemp-ThreadData)], eax
        I = I + 1
ENDM       

        SERIALIZE

        ; read time stamp counter
        rdtsc
        mov     [r13 + (CountTemp-ThreadData)], eax

        SERIALIZE

        ; Empty. Test code goes here in next loop

        SERIALIZE

        ; read time stamp counter
        rdtsc
        sub     [r13 + (CountTemp-ThreadData)], eax        ; CountTemp[0]

        SERIALIZE

        ; Read counters
        I = 0
REPT    NUM_COUNTERS
        mov     ecx, [Counters + I*4]
        rdpmc
        sub     [r13 + I*4 + 4 + (CountTemp-ThreadData)], eax  ; CountTemp[I+1]
        I = I + 1
ENDM       

        SERIALIZE

        ; find minimum counts
        I = 0
REPT    NUM_COUNTERS + 1
        mov     eax, [r13+I*4+(CountTemp-ThreadData)]       ; -count
        neg     eax
        mov     ebx, [r13+I*4+(CountOverhead-ThreadData)]   ; previous count
        cmp     eax, ebx
        cmovb   ebx, eax
        mov     [r13+I*4+(CountOverhead-ThreadData)], ebx   ; minimum count       
        I = I + 1
ENDM       
       
        ; end second test loop
        inc     r14d
        cmp     r14d, OVERHEAD_REPETITIONS
        jb      TEST_LOOP_1

ENDIF   ; SUBTRACT_OVERHEAD

       
; Second test loop. Measure user code
        xor     r14d, r14d                    ; Loop counter

TEST_LOOP_2:

        SERIALIZE
     
        ; Read counters
        I = 0
REPT    NUM_COUNTERS
        mov     ecx, [Counters + I*4]
        rdpmc
        mov     [r13 + I*4 + 4 + (CountTemp-ThreadData)], eax
        I = I + 1
ENDM       

        SERIALIZE

        ; read time stamp counter
        rdtsc
        mov     [r13 + (CountTemp-ThreadData)], eax

        SERIALIZE

;##############################################################################
;#
;#                 Test code start
;#
;##############################################################################

; Put the assembly code to test here
; Don't modify r13, r14, r15!

; œœ

rept 1000        ; example: 100 shift instructions
mov eax,123
;mov rax,123
;shr eax,5

endm



;##############################################################################
;#
;#                 Test code end
;#
;##############################################################################

        SERIALIZE

        ; read time stamp counter
        rdtsc
        sub     [r13 + (CountTemp-ThreadData)], eax        ; CountTemp[0]

        SERIALIZE

        ; Read counters
        I = 0
REPT    NUM_COUNTERS
        mov     ecx, [Counters + I*4]
        rdpmc
        sub     [r13 + I*4 + 4 + (CountTemp-ThreadData)], eax  ; CountTemp[I+1]
        I = I + 1
ENDM       

        SERIALIZE

        ; subtract counts before from counts after
        mov     eax, [r13 + (CountTemp-ThreadData)]            ; -count
        neg     eax
IF      SUBTRACT_OVERHEAD
        sub     eax, [r13+(CountOverhead-ThreadData)]   ; overhead clock count       
ENDIF   ; SUBTRACT_OVERHEAD       
        mov     [r13+r14*4+(ClockResults-ThreadData)], eax      ; save clock count
       
        I = 0
REPT    NUM_COUNTERS
        mov     eax, [r13 + I*4 + 4 + (CountTemp-ThreadData)]
        neg     eax
IF      SUBTRACT_OVERHEAD
        sub     eax, [r13+I*4+4+(CountOverhead-ThreadData)]   ; overhead pmc count       
ENDIF   ; SUBTRACT_OVERHEAD       
        mov     [r13+r14*4+I*4*REPETITIONS+(PMCResults-ThreadData)], eax      ; save count       
        I = I + 1
ENDM       
       
        ; end second test loop
        inc     r14d
        cmp     r14d, REPETITIONS
        jb      TEST_LOOP_2

        ; clean up
        mov     rsp, [r13+(RSPSave-ThreadData)]   ; restore stack pointer       
        finit
        cld
IF USEAVX
        VZEROALL                 ; clear all ymm registers
ENDIF       

        ; return REPETITIONS;
        mov     eax, REPETITIONS   ; return value
       
IF      WINDOWS                    ; Restore registers saved in Windows
        movaps  xmm6, [rsp]
        movaps  xmm7, [rsp+10H]
        movaps  xmm8, [rsp+20H]
        movaps  xmm9, [rsp+30H]
        movaps  xmm10, [rsp+40H]
        movaps  xmm11, [rsp+50H]
        movaps  xmm12, [rsp+60H]
        movaps  xmm13, [rsp+70H]
        movaps  xmm14, [rsp+80H]
        movaps  xmm15, [rsp+90H]
        add     rsp, 0A8H           ; Free space for saving xmm6 - 15
        pop     rdi
        pop     rsi
ENDIF
        pop     r15
        pop     r14
        pop     r13
        pop     r12
        pop     rbp
        pop     rbx
        ret
       
TestLoop ENDP

END

I'd rather be this ambulant metamorphosis than to have that old opinion about everything

mineiro

So, in linux we have more freedom to use some optimized functions.
One example is link below, that it's possible to insert an optimized function inside an executable file (without need of source code):
https://masm32.com/board/index.php?topic=8736.0

I'd rather be this ambulant metamorphosis than to have that old opinion about everything

NoCforMe

#4
Thanks for this exhaustive description of your excruciating installation process. However, instead of doing this, I think I'll just sit in the corner and hit myself over the head with a hammer a few times ...
Assembly language programming should be fun. That's why I do it.

jj2007


mineiro

The tool work, I was using it wrongly. My fault about mov eax,123 or mov rax,123.
This don't invalidate this topic.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

Biterider

Hi
@mineiro, it takes a lot of courage to publicly admit you were wrong and I respect that very much. 
It shows a maturity not often seen.

Also, I would like to remind everyone that we have a rule on the forum that hasn't been followed very closely:

Quote from: hutch-- on May 18, 2012, 06:53:06 PMRespect For Members
2. The basis of the rules in the forum is respect for other members. Respect for new members who are struggling with assembler, respect for experienced members who are here to help and respect for differences between members. This cannot be done by rigid compliance to stated rules and is based on intent. Please remember that every person who posts in the forum is another human being and deserves to be treated like one.

I would like to ask everyone to behave like adults and to think for a moment before writing an answer that may have been misunderstood.
In addition, everyone should try to actively avoid escalations.
This will help us avoid unpleasant situations in the future.

Biterider

NoCforMe

Yes, I know all about that rule.

Before you go all net-nanny on us, please realize that this does not mean that we cannot criticize, point out flaws in reasoning, or just plain make fun of something that deserves it, OK? Sheesh. Seems like some folks here have their offense detectors set on a hair-trigger.
Assembly language programming should be fun. That's why I do it.

mineiro

#9
Ad hominum
I'd rather be this ambulant metamorphosis than to have that old opinion about everything