The MASM Forum

General => The Workshop => Topic started by: jj2007 on June 14, 2012, 12:24:30 AM

Title: Application Binary Interface (ABI), calling conventions and the like
Post by: jj2007 on June 14, 2012, 12:24:30 AM
Agner Fog on using FPU and xmm regs in Win7-64: (http://www.agner.org/optimize/calling_conventions.pdf)
QuoteHowever, a public discussion forum quotes the following answers from Microsoft engineers
regarding this issue: "From: Program Manager in Visual C++ Group, Sent: Thursday, May
26, 2005 10:38 AM. It does preserve the state. It's the DDK page that has stale information,
which I've requested it to be changed. Let them know that the OS does preserve state of
x87 and MMX registers on context switches." and "From: Software Engineer in Windows
Kernel Group, Sent: Thursday, May 26, 2005 11:06 AM. For user threads the state of legacy
floating point is preserved at context switch. But it is not true for kernel threads.
Kernel mode drivers can not use legacy floating point instructions." (www.planetamd64.com/index.php?showtopic=3458&st=100).

The issue has finally been resolved with the long overdue publication of a more detailed ABI
for x64 Windows in the form of a document entitled "x64 Software Conventions", well hidden
in the bin directory (not the help directory) of some compiler packages. This document says:
"The MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are preserved across
context switches. There is no explicit calling convention for these registers. The use of
these registers is strictly prohibited in kernel mode code." The same text has later appeared
at the Microsoft website (msdn2.microsoft.com/en-us/library/a32tsf7t(VS.80).aspx).
My tests indicate that these registers are saved correctly during task switches and thread
switches in 64-bit mode, even in an early beta version of x64 Windows.

I like the red part. It somehow implies that the very latest version of the Windows kernel uses, well, "legacy floating point instructions" :biggrin:

Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: qWord on June 14, 2012, 12:49:55 AM
I think they want to speed up context switches in kernel land. Maybe they also want prevent the slow transcendental function to increase the interruptibility of kernel code.
The question is which kind of driver needs FPU stuff? Basic FP-Arithmetic is still available through SSEx.
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: hutch-- on June 14, 2012, 12:57:51 AM
From memory Microsoft abandoned FPU code some time ago for 64 bit versions, over time SSE will probably do the job if they extend the maths to 128 bit. FPU code can still handle numbers in the 80 bit range but it would seem that Intel also want to shift most maths to SSE rather than the now ancient FPU.
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: qWord on June 14, 2012, 01:02:40 AM
Quote from: hutch-- on June 14, 2012, 12:57:51 AM
From memory Microsoft abandoned FPU code some time ago for 64 bit versions, over time SSE will probably do the job if they extend the maths to 128 bit.
Only kernel code is affected. User mode applications can still use the FPU.
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: dedndave on June 14, 2012, 01:16:11 AM
Quote from: hutch-- on June 14, 2012, 12:57:51 AM
From memory Microsoft abandoned FPU code some time ago for 64 bit versions, over time SSE will probably do the job if they extend the maths to 128 bit. FPU code can still handle numbers in the 80 bit range but it would seem that Intel also want to shift most maths to SSE rather than the now ancient FPU.

maybe that implies the future intentions of intel (assuming that intel and ms collaborate)
they may intend to phase it out over the next few generations of processors
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: jj2007 on June 14, 2012, 01:26:15 AM
> Kernel mode drivers can not use legacy floating point instructions

They say "don't use them", not: "if you use them, preserve them". Who would be affected by "wrong" FPU values if not Kernel code itself? Or do I misunderstand something completely? Kernel-wise I am a noob...
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: dedndave on June 14, 2012, 01:27:46 AM
may have something to do with handling FPU exceptions
i thought you were an expert on that stuff   :biggrin:
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: qWord on June 14, 2012, 01:51:12 AM
For kernel threads, the FPU registers/status is not saved while a context switch occurs. That means that the whole FPU contents can change from one instruction to the next, if a context switch has occurred between them.
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: jj2007 on June 14, 2012, 04:00:27 AM
OK, got it, thanks :t
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: Zen on June 14, 2012, 05:20:46 AM
This is mind-boggling,...
As if kernel-mode programming wasn't confusing enough already,...
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: jj2007 on June 14, 2012, 06:12:28 AM
I've done some tests checking how much it costs to save & restore the xmm regs that Win7-64 so merciless trashes:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
89      cycles for fxsave
83      cycles for fxrstor
152     cycles for fsave
113     cycles for frstor

89      cycles for fxsave
83      cycles for fxrstor
152     cycles for fsave
113     cycles for frstor


172 cycles on my puter. Looks like a lot but effectively they are needed only around some probably utterly slow Windows API calls.
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: dedndave on June 14, 2012, 01:29:00 PM
prescott w/htt - XP MCE2005 SP3
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
162     cycles for fxsave
243     cycles for fxrstor
530     cycles for fsave
576     cycles for frstor

158     cycles for fxsave
243     cycles for fxrstor
528     cycles for fsave
578     cycles for frstor
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: MichaelW on June 14, 2012, 03:56:54 PM
P4 Northwood w/ht XP SP3

Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE2)
87      cycles for fxsave
207     cycles for fxrstor
443     cycles for fsave
526     cycles for frstor

84      cycles for fxsave
202     cycles for fxrstor
434     cycles for fsave
529     cycles for frstor


Interesting to see the drop in IPC for the Prescott compared to the Northwood.

Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: hutch-- on June 14, 2012, 04:39:42 PM
I had similar timing results back when I had both the Northwood and a Prescott as dev boxes, the 2.8 gig Northwood was usually faster than the 3 gig Prescott and had noticable less lag. Apparently the Prescott has a much longer pipeline.
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: jj2007 on June 14, 2012, 04:54:46 PM
Getting faster...
Interesting that fsave/frstor is always much slower. The x variants save 512 bytes to memory.

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
38      cycles for fxsave
73      cycles for fxrstor
166     cycles for fsave
130     cycles for frstor

38      cycles for fxsave
73      cycles for fxrstor
167     cycles for fsave
130     cycles for frstor
Title: Re: Application Binary Interface (ABI), calling conventions and the like
Post by: FORTRANS on June 14, 2012, 10:28:20 PM
Hi,

   For some reason, it crashed on my P-MMX.  (Right,
like I expected an error message?  <g>)  It did run
on my Sony, so the CPUID code is not what locked it
up earlier.  (Presumably.)  Though there was an odd
pause the first time I ran this one.


pre-P4 (SSE1)
61   cycles for fxsave
63   cycles for fxrstor
108   cycles for fsave
86   cycles for frstor

61   cycles for fxsave
63   cycles for fxrstor
108   cycles for fsave
86   cycles for frstor


--- ok --- Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
61   cycles for fxsave
63   cycles for fxrstor
116   cycles for fsave
92   cycles for frstor

61   cycles for fxsave
63   cycles for fxrstor
117   cycles for fsave
93   cycles for frstor


--- ok ---


Cheers,

Steve N.