Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change
Quote from: jj2007 on July 03, 2025, 08:11:59 PMQuote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.
Why so restrictive?
Intel(R) Celeron(R) N5105 @ 2.00GHz (SSE4)
549 cycles for 100 * proc aligned 16
484 cycles for 100 * proc aligned 16+3
550 cycles for 100 * aligned push+pop
482 cycles for 100 * aligned reg32
551 cycles for 100 * proc aligned 16
484 cycles for 100 * proc aligned 16+3
551 cycles for 100 * aligned push+pop
482 cycles for 100 * aligned reg32
550 cycles for 100 * proc aligned 16
485 cycles for 100 * proc aligned 16+3
552 cycles for 100 * aligned push+pop
482 cycles for 100 * aligned reg32
551 cycles for 100 * proc aligned 16
493 cycles for 100 * proc aligned 16+3
562 cycles for 100 * aligned push+pop
493 cycles for 100 * aligned reg32
564 cycles for 100 * proc aligned 16
496 cycles for 100 * proc aligned 16+3
561 cycles for 100 * aligned push+pop
485 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32
--- ok ---
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (SSE4)
344 cycles for 100 * proc aligned 16
256 cycles for 100 * proc aligned 16+3
391 cycles for 100 * aligned push+pop
387 cycles for 100 * aligned reg32
345 cycles for 100 * proc aligned 16
261 cycles for 100 * proc aligned 16+3
392 cycles for 100 * aligned push+pop
380 cycles for 100 * aligned reg32
345 cycles for 100 * proc aligned 16
265 cycles for 100 * proc aligned 16+3
403 cycles for 100 * aligned push+pop
381 cycles for 100 * aligned reg32
341 cycles for 100 * proc aligned 16
260 cycles for 100 * proc aligned 16+3
382 cycles for 100 * aligned push+pop
381 cycles for 100 * aligned reg32
382 cycles for 100 * proc aligned 16
260 cycles for 100 * proc aligned 16+3
374 cycles for 100 * aligned push+pop
389 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32
--- ok ---
AMD Athlon(tm) II X2 220 Processor (SSE3)
505 cycles for 100 * proc aligned 16
402 cycles for 100 * proc aligned 16+3
502 cycles for 100 * aligned push+pop
403 cycles for 100 * aligned reg32
502 cycles for 100 * proc aligned 16
403 cycles for 100 * proc aligned 16+3
502 cycles for 100 * aligned push+pop
403 cycles for 100 * aligned reg32
503 cycles for 100 * proc aligned 16
402 cycles for 100 * proc aligned 16+3
502 cycles for 100 * aligned push+pop
403 cycles for 100 * aligned reg32
502 cycles for 100 * proc aligned 16
402 cycles for 100 * proc aligned 16+3
502 cycles for 100 * aligned push+pop
408 cycles for 100 * aligned reg32
502 cycles for 100 * proc aligned 16
402 cycles for 100 * proc aligned 16+3
503 cycles for 100 * aligned push+pop
403 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32
Quote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.
mov eax, 31416 ; you can mix xmm registers with FPU and ordinary
movd xmm0, eax ; registers and directly print the result
fldpi ; load 3.14159 onto the FPU
mov ecx, 123 ; \n is CrLf, \t is tab in Str$()
Print Str$("\nresult=\t%f", xmm0/ST(0)*ecx) ; output: [newline] result= 1230003.0
Quote from: daydreamer on July 03, 2025, 05:11:13 PM@NoCforMe
best with transferring thru registers in your own code,if you prefer using fpu regs or xmm regs for your real4/real8 variables as coding style to your own PROC's
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (SSE4)
322 cycles for 100 * proc aligned 16
267 cycles for 100 * proc aligned 16+3
392 cycles for 100 * aligned push+pop
392 cycles for 100 * aligned reg32
310 cycles for 100 * proc aligned 16
266 cycles for 100 * proc aligned 16+3
397 cycles for 100 * aligned push+pop
394 cycles for 100 * aligned reg32
308 cycles for 100 * proc aligned 16
269 cycles for 100 * proc aligned 16+3
408 cycles for 100 * aligned push+pop
392 cycles for 100 * aligned reg32
314 cycles for 100 * proc aligned 16
263 cycles for 100 * proc aligned 16+3
404 cycles for 100 * aligned push+pop
399 cycles for 100 * aligned reg32
308 cycles for 100 * proc aligned 16
267 cycles for 100 * proc aligned 16+3
395 cycles for 100 * aligned push+pop
391 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32
-
Quote from: daydreamer on July 02, 2025, 06:58:12 PM32bit fastcall transfer data in registers vs 32 bit invoke pushing data would be most fair to test
Quote from: daydreamer on July 02, 2025, 06:58:12 PM32bit fastcall transfer data in registers vs 32 bit invoke pushing data would be most fair to test
AMD Athlon Gold 3150U with Radeon Graphics (SSE4)
400 cycles for 100 * proc aligned 16
400 cycles for 100 * proc aligned 16+3
417 cycles for 100 * aligned push+pop
273 cycles for 100 * aligned reg32
405 cycles for 100 * proc aligned 16
409 cycles for 100 * proc aligned 16+3
426 cycles for 100 * aligned push+pop
276 cycles for 100 * aligned reg32
409 cycles for 100 * proc aligned 16
402 cycles for 100 * proc aligned 16+3
422 cycles for 100 * aligned push+pop
290 cycles for 100 * aligned reg32
403 cycles for 100 * proc aligned 16
406 cycles for 100 * proc aligned 16+3
426 cycles for 100 * aligned push+pop
278 cycles for 100 * aligned reg32
406 cycles for 100 * proc aligned 16
416 cycles for 100 * proc aligned 16+3
421 cycles for 100 * aligned push+pop
281 cycles for 100 * aligned reg32
15 bytes for proc aligned 16
19 bytes for proc aligned 16+3
24 bytes for aligned push+pop
20 bytes for aligned reg32