Just threw this li'l profiler together, thought it might be useful to someone here. Console program, 32-bit. You just paste in your code, add elements to a structure:
PROFILE <TestCode1, 5, $DESC("REP STOSB")>
PROFILE <TestCode2, 5, $DESC("REP MOVSB")>
PROFILE <TestCode3, 5, $DESC("REP MOVSD")>
PROFILE <TestCode4, 5, $DESC("MOV by DWORDs")>
And hey, @JJ, you'll notice I actually used a macro! Can you believe it? It does make things a lot easier here.
;============================================
; Profiler--a lightweight code profiler
; See how fast (or not) your code is
;============================================
.nolist
include \masm32\include\masm32rt.inc
.list
.586 ;Needed for RDTSC
;============================================
; Defines, structures, prototypes, etc.
;============================================
$CRLF EQU <13, 10>
$tab EQU 9
;***** Profile structure *****
PROFILE STRUCT
ep DD ? ;Module entry pt.
runX DD ? ;How many times to run it
desc DD ? ;Ptr. to module description
time0lo DD ? ;Low DWORD of start time
time0hi DD ? ;High DWORD " "
time1lo DD ? ;Low DWORD of end time
time1hi DD ? ;High DWORD " "
PROFILE ENDS
$DESC MACRO descText
LOCAL txtvar
.const
txtvar DB descText, 0
.data
EXITM <txtvar>
ENDM
;============================================
; HERE BE DATA
;============================================
.data
;***** Profile List *****
;
; Create the list of PROFILE structures here.
; Format:
; PROFILE <entry point, # times to run, $DESC("description")>
ProfileList LABEL PROFILE
PROFILE <TestCode1, 5, $DESC("REP STOSB")>
PROFILE <TestCode2, 5, $DESC("REP MOVSB")>
PROFILE <TestCode3, 5, $DESC("REP MOVSD")>
PROFILE <TestCode4, 5, $DESC("MOV by DWORDs")>
; IMPORTANT: put this in so we know where the list ends!
PROFILE <-1>
IDstr LABEL BYTE
DB $CRLF, "Profiler by Fear No Evil Software, @2024", $CRLF, $CRLF, 0
ResultsFmt1 LABEL BYTE
DB "------------------------------------------------------", $CRLF
DB "Timings for %s:", $CRLF, 0
ResultsFmt2 LABEL BYTE
DB $tab, "%u. %u cycles", $CRLF, 0
ResultsFmt3 LABEL BYTE
DB "Average: %u cycles.", $CRLF, 0
ResultsEndStr LABEL BYTE
DB "------------------------------------------------------", $CRLF, $CRLF, 0
;============================================
; UNINITIALIZED DATA
;============================================
.data?
InstanceHandle HINSTANCE ?
;============================================
; CODE LIVES HERE
;============================================
.code
start: INVOKE GetModuleHandle, NULL
MOV InstanceHandle, EAX
CALL WinMain
INVOKE ExitProcess, 0 ;Always successful.
;====================================================================
; Mainline proc
;====================================================================
WinMain PROC
LOCAL runNum:DWORD, timeSum:DWORD, buffer[256]:BYTE
INVOKE StdOut, OFFSET IDstr
MOV EBX, OFFSET ProfileList
nxtpro: CMP [EBX].PROFILE.ep, -1 ;End o'list?
JE done
INVOKE wsprintf, ADDR buffer, OFFSET ResultsFmt1, [EBX].PROFILE.desc
INVOKE StdOut, ADDR buffer
MOV runNum, 1
MOV timeSum, 0
; Run the test "runX" times:
MOV ECX, [EBX].PROFILE.runX ;# of iterations.
runtst: PUSH ECX
RDTSC ;Get start time.
MOV [EBX].PROFILE.time0lo, EAX
MOV [EBX].PROFILE.time0hi, EDX
CALL [EBX].PROFILE.ep
RDTSC ;Get end time.
MOV [EBX].PROFILE.time1lo, EAX
MOV [EBX].PROFILE.time1hi, EDX
; Compute elapsed time:
; We're only using the low DWORD of the time, since total elapsed time
; rarely exceeds the low tens or hundreds of thousands.
SUB EAX, [EBX].PROFILE.time0lo
ADD timeSum, EAX
; Format & display result text:
INVOKE wsprintf, ADDR buffer, OFFSET ResultsFmt2, runNum, EAX
INVOKE StdOut, ADDR buffer
POP ECX
INC runNum
LOOP runtst
; Calculate average and display it:
MOV EAX, timeSum
XOR EDX, EDX
DIV [EBX].PROFILE.runX
INVOKE wsprintf, ADDR buffer, OFFSET ResultsFmt3, EAX
INVOKE StdOut, ADDR buffer
INVOKE StdOut, OFFSET ResultsEndStr
ADD EBX, SIZEOF PROFILE
JMP nxtpro
done: RET
WinMain ENDP
;====================================================================
; T E S T C O D E A R E A
;
; Put your code routines here to be timed
;====================================================================
TestCode1 PROC
LOCAL buffer[1024]:BYTE
PUSH EDI
MOV ECX, SIZEOF buffer
MOV AL, 17
LEA EDI, buffer
REP STOSB
POP EDI
RET
TestCode1 ENDP
TestCode2 PROC
LOCAL buffer1[1024]:BYTE, buffer2[1024]:BYTE
PUSH ESI
PUSH EDI
LEA ESI, buffer1
LEA EDI, buffer2
MOV ECX, SIZEOF buffer2
REP MOVSB
POP EDI
POP ESI
RET
TestCode2 ENDP
TestCode3 PROC
LOCAL buffer1[1024]:BYTE, buffer2[1024]:BYTE
PUSH ESI
PUSH EDI
LEA ESI, buffer1
LEA EDI, buffer2
MOV ECX, SIZEOF buffer2 / SIZEOF DWORD
REP MOVSD
POP EDI
POP ESI
RET
TestCode3 ENDP
TestCode4 PROC
LOCAL buffer1[1024]:BYTE, buffer2[1024]:BYTE
PUSH EBX
LEA EBX, buffer1
LEA EDX, buffer2
MOV ECX, SIZEOF buffer2 / SIZEOF DWORD
@@: MOV EAX, [EBX]
MOV [EDX], EAX
ADD EBX, SIZEOF DWORD
ADD EDX, SIZEOF DWORD
LOOP @B
POP EBX
RET
TestCode4 ENDP
END start
Question: what is the "real" profiler that everyone here uses to test their code? Is it part of the MASM32 package? (or MASM64?)
Quote from: NoCforMe on November 26, 2024, 08:40:57 AMwhat is the "real" profiler that everyone here uses to test their code?
StackWalk and SetWatch (https://masm32.com/board/index.php?topic=9966.msg109226#msg109226)
Quote from: jj2007 on November 26, 2024, 09:48:34 AMQuote from: NoCforMe on November 26, 2024, 08:40:57 AMwhat is the "real" profiler that everyone here uses to test their code?
StackWalk and SetWatch (https://masm32.com/board/index.php?topic=9966.msg109226#msg109226)
JJ: You
know I don't use MasmBasic, as much as I admire it.
Isn't there a regular Masm32 tool that people use here?
I swear, sometime I think you post stuff like this just to needle me ...
Hmm, I guess I missed this at the end of that post:
QuoteP.S.: By changing line 1 to useMB=0, the whole program becomes standard Masm32 SDK code and does no longer require MasmBasic
Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997. All rights reserved.
Assembling: MiniWinMasm32Profiling.asm
##########################
You CANNOT use the MasmBasic library with
the old ml.exe version 6.14 - use UAsm instead
##########################
MiniWinMasm32Profiling.asm(27) : error A2070: invalid instruction operands
SWalkE(23): Macro Called From
MiniWinMasm32Profiling.asm(27): Main Line Code
*** MasmBasic version 10.06.2022 ***
* Warning: SQWORD is unsigned with this assembler *
MiniWinMasm32Profiling.asm(121) : error A2070: invalid instruction operands
SWalkE(23): Macro Called From
MiniWinMasm32Profiling.asm(121): Main Line Code
MiniWinMasm32Profiling.asm(124) : error A2070: invalid instruction operands
SWalkE(23): Macro Called From
swEnd(214): Macro Called From
StackWalk(2): Macro Called From
MiniWinMasm32Profiling.asm(124): Main Line Code
MiniWinMasm32Profiling.asm(124) : error A2070: invalid instruction operands
SWalkE(23): Macro Called From
swEnd(220): Macro Called From
StackWalk(2): Macro Called From
MiniWinMasm32Profiling.asm(124): Main Line Code
\masm32\MasmBasic\MasmBasic.inc(733) : error A2052: forced error
TestMasmVersion(8): Macro Called From
\masm32\MasmBasic\MasmBasic.inc(733): Include File
_
Assembly Error
Fuggedaboudit. Life is too damn short.
@ jj2007
Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997. All rights reserved.
Assembling: MiniWinMasm32Profiling.asm
##########################
You CANNOT use the MasmBasic library with
the old ml.exe version 6.14 - use UAsm instead
MasmBasic Rename it to UasmBasic, maybe ??? :tongue:
@NoCforMe What is your profiler used for? What does it do?
Edit later:
Ok, I looked at the code. Counting cycles? Timing?
So basically it is just an algorithm timing/cycle counting testbed?
I though that you didn't care about faster code, or the timing of algorithms, judging by some of your replies in The Laboratory.
Quote from: zedd151 on November 26, 2024, 02:46:38 PMI though that you didn't care about faster code, or the timing of algorithms, judging by some of your replies in The Laboratory.
It's not that I don't care at all, if you've followed my trouble with my BMP info viewer program, which was running terribly slooooow.
It's just that I'm not speed-obsessed like some other folks here seem to be. But sure, who doesn't want their code to be fast?
Anyhow, it's a really simple profiler. You might want to try it out. All you have to do is stick the code you want to test in the source file (as a subroutine, a
PROC), make an entry in the
PROFILE structure list, and Bob's your uncle, as they say.
I use a simpler method, using only GetTickCount. It's good enough for Gubbamint work; i.e., getting an algorithm that used to take over 4 seconds for 1000 reps, down to under a second for a 1000 reps. :tongue:
But I'll check your profiler out later on some time. Maybe the next time I have some code that seems a little sluggish.
Well, in your own code, instead of using GetTickCount(), which only has millisecond resolution, use RDTSC. It's a 1-byte processor instruction that gives a count of clock cycles, far more accurate. It returns a 64-bit value in EDX:EAX. It's what I'm using in this profiler. (I'm only using the low DWORD in EAX.)
You need to include the .586 directive to use this instruction.
Set max a viable Opcode's to use
.686p
.XMM
Many old coders came from slow CPUs was nesserary to make code as fast as possible, old school demoscene before pixelshaders took over was to make 256b1k,4k,64k both fast and small,win first price in demoscene can be coders goal,when lacking athletic body to run and win 200 m
In laboratory we enjoyed speed up sort 28x
demoscene pixelshaders XMM SIMD blah blah blah ...