Masm32 SDK description, downloads and other helpful links
Started by coder, June 13, 2017, 12:59:16 PM
Quote from: sinsi on June 14, 2017, 01:49:34 PMDon't forget that in 64-bit you can use the spill space for any register storage, not just RCX/RDX etc.I have seen Windows APIs store RBX (and even XMM0) at [rsp+8].My main bugbear with "mov [rsp+xx]" is the code size, e.g. CreateWindowEx has 12 parameters, using push cuts down on code size.Not so much of a problem in 32-bit land.
Quote from: hutch-- on June 14, 2017, 12:21:01 PMThe thing going against using push/pop in 64 bit is alignment. It may be a familiar technique from 32 bit and earlier but with the Microsoft ABI, you are then stuck with manual stack twiddling to get procedures to work. The 64 bit alignment of arguments written to the stack after using the first 4 registers in the ABI can handle BYTE, WORD, DWORD and QWORD without having to change the RSP stack pointer as each is written to a stack memory location that is 64 bit aligned.It may be character building to play with manual stack adjustments just to get a procedure to run but if reliable code is the target, the last thing you want is pissing around aligning the stack just to get a procedure to run. For slightly more typing, if you create a local for each register you need to preserve and later restore and copy the content into the register on the way in and vice versa on the way out, you get direct register / memory writes both ways without messing up the stack. It looks like this.LOCAL .r12 :QWORDLOCAL .r13 :QWORDLOCAL .r14 :QWORDLOCAL .r15 :QWORDmov .r12, r12mov .r13, r13mov .r14, r14mov .r15, r15; socket 2 'emmov r12, .r12mov r13, .r13mov r14, .r14mov r15, .r15retAh look mum, no stack twiddling. :P
admin@mint ~/nasm $ ./time64PUSH = 425MOV = 284admin@mint ~/nasm $ ./time64PUSH = 423MOV = 282admin@mint ~/nasm $ ./time64PUSH = 426MOV = 281admin@mint ~/nasm $ ./time64PUSH = 419MOV = 284admin@mint ~/nasm $ ./time64PUSH = 426MOV = 282admin@mint ~/nasm $ ./time64PUSH = 423MOV = 284
Quote from: coder on June 14, 2017, 03:39:54 PMon different PC, with completely different results
Quote from: hutch-- on June 14, 2017, 04:29:31 PM"mov" just transfers data where "push" must transfer data AND update the stack pointer.
Quote from: jj2007 on June 14, 2017, 08:22:20 AMSteve, I rarely use the timeit >results.txt version, but I wouldn't have suggested that without testing it OK, on Win7-64. Now I have also tested it on WinXP SP3 and Win 10.0.15063 - no problems. Which Windows version are you using?
Quote from: hutch-- on June 14, 2017, 04:29:31 PM> MOV is faster all the way doesn't matter how many writes are performed if compared to PUSHes.This is normally the case, "mov" just transfers data where "push" must transfer data AND update the stack pointer.
Quote from: coder on June 15, 2017, 03:33:42 PMLooking at the PUSH definition, I see lots of other chores involved. Worse with POP.
Quote from: jj2007 on June 15, 2017, 05:08:36 PMQuote from: coder on June 15, 2017, 03:33:42 PMLooking at the PUSH definition, I see lots of other chores involved. Worse with POP.More precisely? And how is that different to rep movsd, the fastest way to copy values mem to mem?
Quote from: hutch-- on June 15, 2017, 04:33:10 PMIt may be worth the effort to look up the effects of running 32 bit code on a 64 bit processor as it is run in "legacy" mode and may effect the efficiency of the 32 bit code.
QuoteWe are talking about stack in much more general sense, not specific or selected processors. Whether a processor implements a dedicated circuitry to make certain instructions fast (like rep movsd), is really not my concern at this moment.