Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change
Function LowerCase Naked (s As String) As ZString Ptr
Asm
.data
lcase_table:
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.byte 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.text
mov rdx,QWORD PTR [rcx]
mov r10,OFFSET lcase_table
sub rdx,1
_loop:
add rdx,1
movzx rax,BYTE PTR [rdx]
movzx r8,BYTE PTR [r10+rax]
shl r8,5
add BYTE PTR [rdx],r8b
jnz _loop
mov rax,QWORD PTR [rcx]
ret
End Asm
End Function
Dim As String t="THIS is a TEST."
Print *Lowercase(t)
Quote from: NoCforMe on July 04, 2025, 03:12:16 AMstill performance wise,calling a proc that performs any integer array math or float math on an array of integers or floats,using ecx to pass lenght of array(s) and one or more other registers as pointing as start of arrays and maybe return fail value in EAX back from PROC ,for example if detects divide by zero or other things that it failed to complete math operation on array(s)Quote from: jj2007 on July 03, 2025, 08:11:59 PMQuote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.
Why so restrictive?
I hardly ever use the FPU in my programs, and have never messed around w/XMM. Most of my code is in integer-land.
Nothing wrong with using either one of those register sets to pass parameters, of course.
Quote from: jj2007 on July 03, 2025, 08:11:59 PMQuote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.
Why so restrictive?
Intel(R) Celeron(R) N5105 @ 2.00GHz (SSE4)
549 cycles for 100 * proc aligned 16
484 cycles for 100 * proc aligned 16+3
550 cycles for 100 * aligned push+pop
482 cycles for 100 * aligned reg32
551 cycles for 100 * proc aligned 16
484 cycles for 100 * proc aligned 16+3
551 cycles for 100 * aligned push+pop
482 cycles for 100 * aligned reg32
550 cycles for 100 * proc aligned 16
485 cycles for 100 * proc aligned 16+3
552 cycles for 100 * aligned push+pop
482 cycles for 100 * aligned reg32
551 cycles for 100 * proc aligned 16
493 cycles for 100 * proc aligned 16+3
562 cycles for 100 * aligned push+pop
493 cycles for 100 * aligned reg32
564 cycles for 100 * proc aligned 16
496 cycles for 100 * proc aligned 16+3
561 cycles for 100 * aligned push+pop
485 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32
--- ok ---
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (SSE4)
344 cycles for 100 * proc aligned 16
256 cycles for 100 * proc aligned 16+3
391 cycles for 100 * aligned push+pop
387 cycles for 100 * aligned reg32
345 cycles for 100 * proc aligned 16
261 cycles for 100 * proc aligned 16+3
392 cycles for 100 * aligned push+pop
380 cycles for 100 * aligned reg32
345 cycles for 100 * proc aligned 16
265 cycles for 100 * proc aligned 16+3
403 cycles for 100 * aligned push+pop
381 cycles for 100 * aligned reg32
341 cycles for 100 * proc aligned 16
260 cycles for 100 * proc aligned 16+3
382 cycles for 100 * aligned push+pop
381 cycles for 100 * aligned reg32
382 cycles for 100 * proc aligned 16
260 cycles for 100 * proc aligned 16+3
374 cycles for 100 * aligned push+pop
389 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32
--- ok ---
AMD Athlon(tm) II X2 220 Processor (SSE3)
505 cycles for 100 * proc aligned 16
402 cycles for 100 * proc aligned 16+3
502 cycles for 100 * aligned push+pop
403 cycles for 100 * aligned reg32
502 cycles for 100 * proc aligned 16
403 cycles for 100 * proc aligned 16+3
502 cycles for 100 * aligned push+pop
403 cycles for 100 * aligned reg32
503 cycles for 100 * proc aligned 16
402 cycles for 100 * proc aligned 16+3
502 cycles for 100 * aligned push+pop
403 cycles for 100 * aligned reg32
502 cycles for 100 * proc aligned 16
402 cycles for 100 * proc aligned 16+3
502 cycles for 100 * aligned push+pop
408 cycles for 100 * aligned reg32
502 cycles for 100 * proc aligned 16
402 cycles for 100 * proc aligned 16+3
503 cycles for 100 * aligned push+pop
403 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32
Quote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.
mov eax, 31416 ; you can mix xmm registers with FPU and ordinary
movd xmm0, eax ; registers and directly print the result
fldpi ; load 3.14159 onto the FPU
mov ecx, 123 ; \n is CrLf, \t is tab in Str$()
Print Str$("\nresult=\t%f", xmm0/ST(0)*ecx) ; output: [newline] result= 1230003.0
Quote from: daydreamer on July 03, 2025, 05:11:13 PM@NoCforMe
best with transferring thru registers in your own code,if you prefer using fpu regs or xmm regs for your real4/real8 variables as coding style to your own PROC's
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (SSE4)
322 cycles for 100 * proc aligned 16
267 cycles for 100 * proc aligned 16+3
392 cycles for 100 * aligned push+pop
392 cycles for 100 * aligned reg32
310 cycles for 100 * proc aligned 16
266 cycles for 100 * proc aligned 16+3
397 cycles for 100 * aligned push+pop
394 cycles for 100 * aligned reg32
308 cycles for 100 * proc aligned 16
269 cycles for 100 * proc aligned 16+3
408 cycles for 100 * aligned push+pop
392 cycles for 100 * aligned reg32
314 cycles for 100 * proc aligned 16
263 cycles for 100 * proc aligned 16+3
404 cycles for 100 * aligned push+pop
399 cycles for 100 * aligned reg32
308 cycles for 100 * proc aligned 16
267 cycles for 100 * proc aligned 16+3
395 cycles for 100 * aligned push+pop
391 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32
-