Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change
Quote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.
mov eax, 31416 ; you can mix xmm registers with FPU and ordinary
movd xmm0, eax ; registers and directly print the result
fldpi ; load 3.14159 onto the FPU
mov ecx, 123 ; \n is CrLf, \t is tab in Str$()
Print Str$("\nresult=\t%f", xmm0/ST(0)*ecx) ; output: [newline] result= 1230003.0
Quote from: daydreamer on July 03, 2025, 05:11:13 PM@NoCforMe
best with transferring thru registers in your own code,if you prefer using fpu regs or xmm regs for your real4/real8 variables as coding style to your own PROC's
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (SSE4)
322 cycles for 100 * proc aligned 16
267 cycles for 100 * proc aligned 16+3
392 cycles for 100 * aligned push+pop
392 cycles for 100 * aligned reg32
310 cycles for 100 * proc aligned 16
266 cycles for 100 * proc aligned 16+3
397 cycles for 100 * aligned push+pop
394 cycles for 100 * aligned reg32
308 cycles for 100 * proc aligned 16
269 cycles for 100 * proc aligned 16+3
408 cycles for 100 * aligned push+pop
392 cycles for 100 * aligned reg32
314 cycles for 100 * proc aligned 16
263 cycles for 100 * proc aligned 16+3
404 cycles for 100 * aligned push+pop
399 cycles for 100 * aligned reg32
308 cycles for 100 * proc aligned 16
267 cycles for 100 * proc aligned 16+3
395 cycles for 100 * aligned push+pop
391 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32
-
Quote from: daydreamer on July 02, 2025, 06:58:12 PM32bit fastcall transfer data in registers vs 32 bit invoke pushing data would be most fair to test
Quote from: daydreamer on July 02, 2025, 06:58:12 PM32bit fastcall transfer data in registers vs 32 bit invoke pushing data would be most fair to test
AMD Athlon Gold 3150U with Radeon Graphics (SSE4)
400 cycles for 100 * proc aligned 16
400 cycles for 100 * proc aligned 16+3
417 cycles for 100 * aligned push+pop
273 cycles for 100 * aligned reg32
405 cycles for 100 * proc aligned 16
409 cycles for 100 * proc aligned 16+3
426 cycles for 100 * aligned push+pop
276 cycles for 100 * aligned reg32
409 cycles for 100 * proc aligned 16
402 cycles for 100 * proc aligned 16+3
422 cycles for 100 * aligned push+pop
290 cycles for 100 * aligned reg32
403 cycles for 100 * proc aligned 16
406 cycles for 100 * proc aligned 16+3
426 cycles for 100 * aligned push+pop
278 cycles for 100 * aligned reg32
406 cycles for 100 * proc aligned 16
416 cycles for 100 * proc aligned 16+3
421 cycles for 100 * aligned push+pop
281 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32
D:\BCX>DllToInc64.exe
Usage : DllToInc64.exe DllFile.dll [optional -u]
Version 1.0
-u : Create include file for UNICODE API functions.
DllToInc64.exe kernel32.dll